Note
Go to the end to download the full example code.
Ontology-aware semantics
node-graph Engine already records rich structural provenance through AiiDA
link types, but structure alone does not explain what a piece of data
means. The ontology-aware semantics feature lets you attach domain
vocabularies to sockets and automatically export that context alongside
individual AiiDA Data nodes as JSON-LD snippets. This page provides
enough background for readers new to ontologies, explains how the
implementation works, and finishes with a runnable example you can adapt to
your plugins.
Why ontologies and semantics?
Ontology (plain words): a curated dictionary of concepts and their relationships. Scientific ontologies describe things like “potential energy,” “graphene,” or “defect” and standardise how they are referenced. Examples:
QUDT (Quantities, Units, Dimensions and Data) – defines physical quantities and units, e.g.
qudt:PotentialEnergyorqudt-unit:EV.PROV-O (W3C Provenance Ontology) – describes provenance concepts such as
prov:Entity(a data artefact),prov:Activity(a process), andprov:used(an input relation).OBO / NOMAD / in-house schemas – any controlled vocabulary you care about can be referenced, whether public or private.
Semantics: the act of tagging actual data with ontology identifiers so machines can tell what a number represents. Two floating-point values become distinct once one is tagged as “cohesive energy in eV” and the other as “temperature in K”.
Benefits—even if you have zero ontology experience today:
Interoperability – exports from your workflows can be ingested by ELNs, data portals, or SPARQL endpoints without custom glue code.
Queryability – by recording predicates like
qudt:unitorschema:materialyou can answer questions such as “show me all workflows that emitted a cohesive energy in eV during March”.Traceability – annotations become machine-readable documentation explaining why a port exists and how to interpret it.
Why structural provenance alone is insufficient
AiiDA’s native provenance (INPUT_CALC, INPUT_WORK, CREATE,
CALL_CALC) already captures who-used-what. However, those links do
not store:
The physical meaning of a socket (is
resultenergy, force, magnetisation?).Units, reference systems, or ontology terms.
Relationships to external datasets, materials IDs, or DOIs.
When you share an AiiDA export the graph is consistent, but collaborators still need tribal knowledge to interpret each port. Ontology annotations let you declare the domain semantics directly at authoring time so the meaning travels with the data.
How node-graph maps annotations into JSON-LD snippets
The feature builds on the existing provenance recorder:
Collect annotations – every socket’s
meta.semanticspayload is inspected. Payloads undersemantics(or shorthand keys likeiri/label) are normalised into internalSemanticsAnnotationobjects that remember ontology IDs, RDF types, namespace prefixes, custom attributes, and relations.Observe execution – when a task finishes, Graph flattens its outputs and matches socket paths (
result,stress__xx…) against the stored annotations. It performs the same matching on incomingINPUT_CALC/INPUT_WORKlinks so consumer nodes can record what they used.Emit JSON-LD snippets – for each annotated socket the engine builds a small JSON-LD payload (containing
@context,@id,@type, and any predicates you supplied) and stores it alongside theDatanode that travelled through that socket.Resolve cross-socket references – relation values can include dotted socket paths like
"outputs.band"that the engine rewrites toaiida://node/...references pointing to sibling sockets. This lets you say things like “this StructureData input has the BandStructureData produced by the graph” without duplicating provenance.Persist – extras are appended directly to the produced/consumed
Datanodes undernode.base.extras['semantics']as a list of records, so the annotations remain available even after exporting the provenance.
The result is a per-node JSON-LD breadcrumb that semantic tooling can ingest while AiiDA continues to guarantee structural integrity.
Feature summary and use cases
Socket annotations drive everything – keep using
Float/Dictnodes; annotate sockets viameta(semantics={...})to add semantics.Namespace merging – per-socket
contextdictionaries define prefixes ({"qudt": "http://qudt.org/schema/qudt/"}); the engine merges them into the JSON-LD@contextautomatically.Flexible attributes/relations – use
semantics.attributesfor predicate/value pairs (units, uncertainties, DOIs) andsemantics.relationsfor references to other resources (values can themselves include{"@id": ...}).Per-node storage – input and output
Datanodes receive their respective semantics payload, enabling provenance-aware database queries without extra joins. Because relation values can reference other sockets via dotted paths (e.g."outputs.result"), you can declare facts like “this workflow input has the band structure produced downstream” without copying process metadata into extras.Typed authoring – pass
SemanticTag(a Pydantic model) with your own enums instead of raw dictionaries to get IDE autocompletion and validation. Known prefixes automatically pull in their@contextURLs, soqudt:unitdoes not require repeating the namespace.Context defaults – the engine ships with a small namespace registry (
qudt,qudt-unit,prov,schema). If a predicate or IRI uses one of those prefixes and nocontextentry is provided, the registry value is injected. Extend or override globally viaregister_namespace(prefix, iri)or per-annotation by settingcontextexplicitly.Engine-agnostic – the same annotations work across Local, Airflow, Dask, remote PythonJob, etc.
Declaring cross-socket statements
Multi-step workflows often derive properties in later nodes but you may
want to attach those properties to an earlier artefact such as the
structure that kicked off the pipeline. Use dotted socket paths inside
semantics.relations (or attributes) to point at the socket that
carries the property. During execution the engine replaces the path with
an aiida://node/... reference, so the subject Data node keeps a
live link to the derived artefact. The references are scoped to the
inputs and outputs of the current task, avoiding any hard-coded
downstream consumer knowledge.
from node_graph import task
from node_graph.socket_spec import meta, namespace
from typing import Annotated, Any
STRUCTURE_SEMANTICS = meta(
semantics={
"label": "Crystal structure",
"context": {"mat": "https://example.org/mat#"},
"relations": {
"mat:hasProperty": [
{
"socket": "outputs.result",
"label": "Band structure property",
"context": {"mat": "https://example.org/mat#"},
}
],
},
}
)
@task()
def compute_band_structure(structure):
return 1.0
@task.graph()
def workflow(structure: Annotated[str, STRUCTURE_SEMANTICS]):
return compute_band_structure(structure=structure).result
Running workflow records the band-structure semantics on the output as
usual, and also adds a mat:hasProperty relation to the input structure
that points at the produced BandStructureData node with the supplied
label. This makes queries like “give me every StructureData with a band
structure property” possible without encoding workflow-specific knowledge.
Executing the snippet below prints the resulting JSON-LD records for both
sockets so you can see the resolved aiida://node reference:
from node_graph_engine.engines.local import LocalEngine
from aiida import load_profile, orm
import json
load_profile()
graph = workflow.build(structure="test")
engine = LocalEngine()
outputs = engine.run(graph)
structure_node = orm.load_node(engine._graph_pid).inputs.structure
semantics_payload = structure_node.base.extras.all.get("semantics_ref")
print(json.dumps(semantics_payload, indent=2))
Running task: compute_band_structure
Call kwargs: {'structure': 'test'}
Persisting workflow knowledge for process node 26
null
Attaching semantics inside workflows
The above cross-socket references work when the subject and object sockets
are part of the same task. For the sockets belong to different nodes,
you can use the attach_semantics helper function to append
relationships at runtime.
Call attach_semantics with a predicate, the subject, and
one or more property sockets. The helper records the intent on the
graph object and resolves the referenced sockets to aiida://node/...
identifiers once the workflow has finished.
from node_graph.semantics import attach_semantics
@task()
def generate(structure):
return structure
@task()
def compute_density_of_states(structure):
return 1.0
@task.graph()
def workflow(
structure,
) -> Annotated[dict, namespace(output_structure=Any, bands=Any, dos=Any)]:
mutated = generate(structure=structure).result
bands = compute_band_structure(structure=mutated).result
dos = compute_density_of_states(structure=mutated).result
attach_semantics(
mutated,
objects=[bands, dos],
predicate="emmo:hasProperty",
semantics={"label": "Generated structure", "iri": "emmo:Material"},
label="Generated structure",
context={"emmo": "https://emmo.info/emmo#"},
socket_label="result",
)
return {"output_structure": mutated, "bands": bands, "dos": dos}
label and context describe the JSON-LD record you attach (not the
individual relation targets). socket_label points at which socket on the
subject node the metadata should be associated with—in this case the
result output of generate. Relation targets resolve to lightweight
aiida://node/... references; their display label is picked from any
semantics already stored on that node, or the node/process label as a
fallback.
graph = workflow.build(structure="test")
engine = LocalEngine()
outputs = engine.run(graph)
print(
json.dumps(
outputs["output_structure"].base.extras.all.get("semantics_ref"),
indent=2,
)
)
Running task: generate
Call kwargs: {'structure': 'test'}
Running task: compute_band_structure
Call kwargs: {'structure': 'test'}
Running task: compute_density_of_states
Call kwargs: {'structure': 'test'}
Persisting workflow knowledge for process node 30
null
When building the graph, the AiiDA data is not yet available, so we pass
the sockets themselves as arguments.
After execution, converts any Data objects passed as relation targets
into aiida://node/... references and appends/replaces the corresponding
JSON-LD entry on the subject node.
Use case 1 — publishable provenance bundle
Goal: accompany a workflow result (e.g. a phase diagram) with a machine-readable packet that citeable repositories can ingest.
Annotate the sockets you care about:
meta( semantics={ "label": "Formation energy", "iri": "qudt:Energy", "rdf_types": ["qudt:QuantityValue"], "attributes": {"qudt:unit": "qudt-unit:EV"}, "context": {"qudt": "http://qudt.org/schema/qudt/"}, } )
Execute the workflow as usual. After the run, fetch
result_node.base.extras['semantics']for the outputs you plan to publish.Package the JSON-LD snippets next to the usual
verdi archiveoutput or upload them to a SPARQL endpoint.
Result: reviewers or collaborators can visualise/validate provenance with RDF tooling (RDFLib, GraphDB, TopBraid) without recreating your environment.
Use case 2 — semantic validation in CI
Goal: ensure every published workflow includes specific semantic fields (e.g. a QUDT unit).
Write a pytest rule that queries produced
Datanodes and inspectssemantics_payload = data_node.base.extras['semantics'].Assert that each entry has a
qudt:unitpredicate.Fail CI if the assertion does not hold.
Result: developers receive immediate feedback when a socket lacks metadata, keeping semantic debt under control.
Use case 3 — linking to external repositories
Goal: reference existing materials databases, ELNs, or DOIs directly from your provenance.
Add a relation entry:
meta( semantics={ "label": "Relaxed structure", "relations": { "schema:isBasedOn": {"@id": "https://materialsproject.org/materials/mp-149"}, }, } )
After execution, the JSON-LD snippet stored on the output includes a link to the external identifier.
Result: you can stitch together experimental ELNs, literature DOIs, and simulation archives without bespoke schema translations.
Attaching ontology hints to sockets
Every socket spec exposes a dedicated meta.semantics attribute
alongside the legacy meta.extras dictionary. The semantics helper
looks for:
meta.semantics(ormeta.extras['semantics']/ontology/prov) – the primary payload. Fields you can use:label– human-readable description.iri– canonical identifier for the concept (e.g.qudt:PotentialEnergyorhttps://purl.obolibrary.org/obo/CHEBI_27568).rdf_types– list of additional@typeentries (e.g.qudt:QuantityValue).context– prefix-to-IRI map so you can use short identifiers.attributes– predicate/value pairs (units, uncertainty, DOI, temperature, etc.).relations– nested dictionaries describing links to other resources (values can themselves be{"@id": ...}or plain strings).
Convenience keys (
iri,label,rdf_types) – if present outside the mainsemanticspayload (e.g. declared directly inmeta.extras), they are folded into the payload so legacy annotations still work.Arbitrary extra metadata – anything else in
meta.extrasis untouched, so you can track workflow-specific hints alongside semantics.
During execution the engine stores the annotation on every Data node
that crosses the annotated socket, so the information is available to
QueryBuilder searches or post-processing scripts without touching the
parent process nodes.
Runnable example
The script below builds a minimal workflow that computes a lattice energy
with ASE+EMT, annotates the output socket with QUDT terms (note how the
unit is declared with the qudt:unit predicate), executes the graph
locally, and prints the resulting JSON-LD snippets. It requires an active
AiiDA profile and the Local engine.
import json
import typing as t
from aiida import load_profile, orm
from node_graph import task
from node_graph.socket_spec import meta
from node_graph_engine.engines.local import LocalEngine
try: # pragma: no cover - optional dependency for documentation builds
from ase import Atoms
from ase.build import bulk
except Exception: # pragma: no cover - optional dependency
Atoms = None # type: ignore[assignment]
bulk = None # type: ignore[assignment]
profile_loaded = False
try: # pragma: no cover - load_profile interacts with global state
load_profile()
profile_loaded = True
except Exception as exc: # pragma: no cover - documentation build environments may skip AiiDA
print(f"Skipping execution because no AiiDA profile is available: {exc}")
SEMANTICS: t.Dict[str, t.Any] = {
"label": "Cohesive energy",
"iri": "qudt:PotentialEnergy",
"rdf_types": ["qudt:QuantityValue"],
"context": {
"qudt": "http://qudt.org/schema/qudt/",
"qudt-unit": "http://qudt.org/vocab/unit/",
},
"attributes": {"qudt:unit": "qudt-unit:EV"},
"relations": {
"schema:isBasedOn": {
"@id": "https://materialsproject.org/materials/mp-149",
}
},
}
@task()
def calc_energy(
atoms: Atoms,
) -> t.Annotated[
float,
meta(
semantics=SEMANTICS,
extras={"workflow_hint": "emt-energy"},
),
]:
"""Return EMT potential energy and attach ontology metadata."""
from ase.calculators.emt import EMT # imported lazily for the gallery
atoms.set_calculator(EMT())
return atoms.get_potential_energy()
@task.graph()
def EnergyWorkflow(atoms: Atoms):
"""Single-step workflow so we get provenance + semantics automatically."""
return calc_energy(atoms=atoms).result
if not profile_loaded or Atoms is None or bulk is None:
print(
"Ontology semantics demo requires AiiDA + ASE; install dependencies to run the example."
)
else:
aluminum = bulk("Al", "fcc", a=4.05)
graph = EnergyWorkflow.build(atoms=aluminum)
engine = LocalEngine(name="ontology-demo")
outputs = engine.run(graph)
print("\nGraph result:", outputs)
for label, output in outputs.items():
payload = output.base.extras.all.get("semantics_ref")
print(f"\nOutput '{label}' semantics records:")
print(json.dumps(payload, indent=2))
workflow_node = orm.load_node(engine._graph_pid)
outgoing = workflow_node.base.links.get_outgoing()
for entry in outgoing:
semantics_ref = entry.node.base.extras.all.get("semantics_ref")
if semantics_ref:
print(f"\nData node '{entry.link_label}' carries a semantics reference.")
Running task: calc_energy
Call kwargs: {'atoms': Atoms(symbols='Al', pbc=True, cell=[[0.0, 2.025, 2.025], [2.025, 0.0, 2.025], [2.025, 2.025, 0.0]])}
/home/docs/checkouts/readthedocs.org/user_builds/node-graph-engine/checkouts/latest/docs/gallery/autogen/ontology_semantics.py:452: FutureWarning: Please use atoms.calc = calc
atoms.set_calculator(EMT())
Persisting workflow knowledge for process node 38
Graph result: {'result': <Float: uuid: 78be8dd4-980f-494f-8110-a417db9d7999 (pk: 40) value: -0.00150204758623>}
Output 'result' semantics records:
null
Decoding the example annotations
Each entry stored under node.base.extras['semantics'] is the JSON-LD
representation of your annotation. You will see:
@contextwith prefixes declared undersemantics.context.@idderived fromsemantics.iri.@typemirroringsemantics.rdf_types.Literal predicates from
semantics.attributesand relationship predicates fromsemantics.relations.
Because the payload lives on the Data node itself, you can query it via
AiiDA’s QueryBuilder or export it with the usual provenance bundles.
EOS workflow example
The tutorial on the Equation of State workflow already builds a multi-step graph with relaxation, structure generation, bulk EMT calculations, and a Birch-Murnaghan fit. Below we extend that tutorial with semantic annotations so the fitted parameters and intermediate energy/volume points carry ontology metadata.
Annotating the EOS tasks
Only two tasks need changes: the per-structure energy/volume calculator
and the final EOS fitting task. The snippet below shows the additions
(new meta semantics payloads are highlighted). You would paste these
definitions into the tutorial notebook or script before the
eos_workflow graph declaration.
Note
The code below is for illustration and is not run as part of this script.
from typing import Annotated
from node_graph import meta, namespace, task
from ase import Atoms
from ase.calculators.emt import EMT
ENERGY_META = meta(
semantics={
"label": "Cohesive energy",
"iri": "qudt:PotentialEnergy",
"rdf_types": ["qudt:QuantityValue"],
"context": {
"qudt": "http://qudt.org/schema/qudt/",
"qudt-unit": "http://qudt.org/vocab/unit/",
},
"attributes": {"qudt:unit": "qudt-unit:EV"},
}
)
VOLUME_META = meta(
semantics={
"label": "Cell volume",
"iri": "qudt:Volume",
"rdf_types": ["qudt:QuantityValue"],
"context": {"qudt": "http://qudt.org/schema/qudt/"},
"attributes": {"qudt:unit": "qudt-unit:AA3"},
}
)
@task()
def calculate_energy_and_volume(atoms: Atoms) -> Annotated[
dict,
namespace(energy=ENERGY_META, volume=VOLUME_META),
]:
atoms = Atoms.fromdict(atoms)
atoms calc = EMT()
atoms.get_potential_energy()
return {
"energy": atoms.calc.results["energy"],
"volume": atoms.get_volume(),
}
@task()
def fit_eos_model(data: Annotated[dict, "dynamic(dict)"]) -> Annotated[
dict,
namespace(
v0_A3=meta(
semantics={
"label": "Equilibrium volume",
"iri": "qudt:Volume",
"attributes": {"qudt:unit": "qudt-unit:AA3"},
}
),
e0_eV=meta(
semantics={
"label": "Minimum energy",
"iri": "qudt:PotentialEnergy",
"attributes": {"qudt:unit": "qudt-unit:EV"},
}
),
B_GPa=meta(
semantics={
"label": "Bulk modulus",
"iri": "qudt:BulkModulus",
"rdf_types": ["qudt:QuantityValue"],
"context": {
"qudt": "http://qudt.org/schema/qudt/",
"qudt-unit": "http://qudt.org/vocab/unit/",
},
"attributes": {"qudt:unit": "qudt-unit:GPA"},
}
),
),
]:
from ase.eos import Equation Of State
from ase.units import kJ
volumes_list = [value["volume"] for value in data.values()]
energies_list = [value["energy"] for value in data values()]
eos = EquationOfState(volumes_list, energies_list)
v0, e0, B = eos.fit()
B_GPa = B / kJ * 1.0e24
return {"v0_A3": v0, "e0_eV": e0, "B_GPa": B_GPa}
Running the EOS workflow with semantics
With the annotated tasks in place, the existing eos_workflow
definition from the tutorial needs no further changes. Build the graph,
run it with your preferred engine (Local is shown here), and inspect the
stored semantics on the resulting Data nodes:
# This block assumes `eos_workflow` graph is defined from the other tutorial
# from ase.build import bulk
# from aiida run
Implementation details
The semantics feature is implemented by extending the existing
TaskMeta to store the TaskSemantics object when building the
task executor.
When executing task, the engine will store the semantic information with the
AiiDA Data nodes as extras.
If there is cross-socket references, the engine will resolve them to aiida://node/… format.
Total running time of the script: (2 minutes 33.310 seconds)