Knowledge graphs (workflow semantics)

Note

The Local engine persists workflow knowledge graphs out of the box. Other engines call the same helper (persist_workflow_knowledge_graph); ensure integration is enabled for your backend.

Install Neo4j

pip install neo4j
docker run   -p7474:7474    -p7687:7687    -d    -e NEO4J_AUTH=neo4j/secretgraph    neo4j:latest
export NODE_GRAPH_NEO4J_URI="neo4j://localhost"
export NODE_GRAPH_NEO4J_USER="neo4j"
export NODE_GRAPH_NEO4J_PASSWORD="secretgraph"

What gets stored

  • Neo4j knowledge graph is written once per workflow version (keyed by a stable hash) and referenced from the workflow ProcessNode extras.

  • Semantics source - Socket annotations in task definitions (inputs/outputs)

  • Runtime semantics relations/annotations are buffered internally on graph.knowledge_graph.

  • Merge strategy: annotations and runtime additions for the same socket are merged into a single payload. Attachments referencing sockets are resolved to lightweight references; nothing is duplicated per run.

  • Format: stored normalized in Neo4j (no duplicated full JSON blob): - (:KnowledgeGraph) stores workflow/engine metadata plus namespaces_json. - (:Socket) nodes store socket metadata (task/direction/port/label/canonical). - [:TRIPLE] relationships encode semantic predicates, pointing either to another

    (:Socket) or a (:Value) literal node.

    • The semantics payload (namespaces, sockets, triples) is reconstructed on demand and matches Graph.knowledge_graph.to_dict().

    {
      "namespaces": {"qudt": "http://qudt.org/schema/qudt/", "rdf": "...", "rdfs": "..."},
      "sockets": {"task.output.result": {"task": "task", "direction": "output", "port": "result", "label": "Band gap"}},
      "triples": [["task.output.result", "rdf:type", "qudt:QuantityValue"], ["task.output.result", "rdfs:label", "Band gap"]]
    }
    

Where it is stored

Knowledge graphs are persisted to Neo4j (configure via NODE_GRAPH_NEO4J_URI, NODE_GRAPH_NEO4J_USER, NODE_GRAPH_NEO4J_PASSWORD). The workflow ProcessNode that created it stores the UUID under process_node.base.extras['knowledge_graph_uuid'] for quick lookup.

Retrieving a knowledge graph

Example (verdi shell) to fetch a workflow knowledge graph by UUID stored on the workflow node:

from node_graph_engine.neo4j.knowledge_graph import fetch_knowledge_graph
from aiida import orm

kg_uuid = "your-knowledge-graph-uuid"
semantics = fetch_knowledge_graph(kg_uuid)
print("Sockets:", semantics["sockets"])
print("Triples:", semantics["triples"])

Fetch metadata + semantics in one payload:

payload = fetch_knowledge_graph(kg_uuid, include_metadata=True)
print("Workflow:", payload["workflow"])
print("Engine:", payload["engine_kind"])
print("Sockets:", payload["semantics"]["sockets"])

Use KnowledgeGraph.from_dict and visualising directly inside Jupyter-notebook:

from node_graph.knowledge.graph import KnowledgeGraph

kg = KnowledgeGraph.from_dict(payload)
kg

Notes and scope

  • High-throughput friendly: semantics are stored once per workflow version, not per run, avoiding repeated JSON-LD blobs.

  • Per-run nodes remain lean: they only carry the socket-level ontology payload (label/IRI/context/attributes). Agents should follow standard AiiDA provenance (creator process links) and, if needed, the workflow knowledge graph UUID on the workflow ProcessNode to marry runtime values with the workflow schema.

  • Runtime additions (relations, extra attributes) are merged with the static annotations, so user-provided additions on sockets are preserved.

  • Node-level knowledge snapshots are not stored; link from workflow knowledge to run nodes via provenance if you need concrete values.

Schema at a glance

  • Sockets: metadata for each socket (task, direction, port, label).

  • Triples: [subject, predicate, object] with subjects as socket IDs (task.direction.socket), predicates from ontology IRIs/CURIEs, and objects as socket IDs, IRIs, or literals.

  • Context: merged JSON-LD context from annotations/runtime additions plus RDF/RDFS prefixes (available via semantics['namespaces']).