Knowledge graphs (workflow semantics)
Note
The Local engine persists workflow knowledge graphs out of the box. Other engines
call the same helper (persist_workflow_knowledge_graph); ensure integration is
enabled for your backend.
Install Neo4j
pip install neo4j
docker run -p7474:7474 -p7687:7687 -d -e NEO4J_AUTH=neo4j/secretgraph neo4j:latest
export NODE_GRAPH_NEO4J_URI="neo4j://localhost"
export NODE_GRAPH_NEO4J_USER="neo4j"
export NODE_GRAPH_NEO4J_PASSWORD="secretgraph"
What gets stored
Neo4j knowledge graph is written once per workflow version (keyed by a stable hash) and referenced from the workflow
ProcessNodeextras.Semantics source - Socket annotations in task definitions (inputs/outputs)
Runtime semantics relations/annotations are buffered internally on
graph.knowledge_graph.Merge strategy: annotations and runtime additions for the same socket are merged into a single payload. Attachments referencing sockets are resolved to lightweight references; nothing is duplicated per run.
Format: stored normalized in Neo4j (no duplicated full JSON blob): -
(:KnowledgeGraph)stores workflow/engine metadata plusnamespaces_json. -(:Socket)nodes store socket metadata (task/direction/port/label/canonical). -[:TRIPLE]relationships encode semantic predicates, pointing either to another(:Socket)or a(:Value)literal node.The semantics payload (
namespaces,sockets,triples) is reconstructed on demand and matchesGraph.knowledge_graph.to_dict().
{ "namespaces": {"qudt": "http://qudt.org/schema/qudt/", "rdf": "...", "rdfs": "..."}, "sockets": {"task.output.result": {"task": "task", "direction": "output", "port": "result", "label": "Band gap"}}, "triples": [["task.output.result", "rdf:type", "qudt:QuantityValue"], ["task.output.result", "rdfs:label", "Band gap"]] }
Where it is stored
Knowledge graphs are persisted to Neo4j (configure via NODE_GRAPH_NEO4J_URI,
NODE_GRAPH_NEO4J_USER, NODE_GRAPH_NEO4J_PASSWORD). The workflow
ProcessNode that created it stores the UUID under
process_node.base.extras['knowledge_graph_uuid'] for quick lookup.
Retrieving a knowledge graph
Example (verdi shell) to fetch a workflow knowledge graph by UUID stored on the workflow node:
from node_graph_engine.neo4j.knowledge_graph import fetch_knowledge_graph
from aiida import orm
kg_uuid = "your-knowledge-graph-uuid"
semantics = fetch_knowledge_graph(kg_uuid)
print("Sockets:", semantics["sockets"])
print("Triples:", semantics["triples"])
Fetch metadata + semantics in one payload:
payload = fetch_knowledge_graph(kg_uuid, include_metadata=True)
print("Workflow:", payload["workflow"])
print("Engine:", payload["engine_kind"])
print("Sockets:", payload["semantics"]["sockets"])
Use KnowledgeGraph.from_dict and visualising directly inside Jupyter-notebook:
from node_graph.knowledge.graph import KnowledgeGraph
kg = KnowledgeGraph.from_dict(payload)
kg
Notes and scope
High-throughput friendly: semantics are stored once per workflow version, not per run, avoiding repeated JSON-LD blobs.
Per-run nodes remain lean: they only carry the socket-level ontology payload (label/IRI/context/attributes). Agents should follow standard AiiDA provenance (creator process links) and, if needed, the workflow knowledge graph UUID on the workflow
ProcessNodeto marry runtime values with the workflow schema.Runtime additions (relations, extra attributes) are merged with the static annotations, so user-provided additions on sockets are preserved.
Node-level knowledge snapshots are not stored; link from workflow knowledge to run nodes via provenance if you need concrete values.
Schema at a glance
Sockets: metadata for each socket (task, direction, port, label).
Triples:
[subject, predicate, object]with subjects as socket IDs (task.direction.socket), predicates from ontology IRIs/CURIEs, and objects as socket IDs, IRIs, or literals.Context: merged JSON-LD context from annotations/runtime additions plus RDF/RDFS prefixes (available via
semantics['namespaces']).