AiiDA v2.7.0 preview#

As the release of aiida-core version 2.7.0 is just around the corner, in this blog post, we’d like to give you an overview of the various exciting new features and important bug fixes of this minor release. You can already find release candidates on pypi and conda-forge, as well as a docker image for testing purposes. Feedback welcome!

Asynchronous SSH connection (#6626)#

Previously, when data transfer with a remote computer was active, the responsible transport plugins blocked further program execution until the communication was completed. This long-standing limitation presented a potential opportunity for performance improvements.

With the introduction of the new asynchronous SSH transport plugin (core.ssh_async), multiple communications with a remote machine can now happen concurrently.[1]

🚀 When core.ssh_async outperforms core.ssh#

core.ssh_async offers significant performance gains in scenarios where the worker is blocked by heavy transfer tasks, such as uploading, downloading, or copying large files.

Example: Submitting two WorkGraphs/WorkChains with the following logic:

  1. WorkGraph 1 – Heavy I/O operations

    • Uploads a 10 MB file

    • Remotely copies a 1 GB file

    • Retrieves a 1 GB file

  2. WorkGraph 2 – Lightweight task

    • Executes a simple shell command: touch file

Measured time until the second WorkGraph is processed (single worker):

  • core.ssh_async: Only 4 seconds! 🚀🚀🚀🚀 A dramatic improvement!

  • core.ssh: 108 seconds (the second task waits for the first to finish)

⚖️ When core.ssh_async and core.ssh perform similarly#

For mixed workloads involving numerous uploads and downloads—a common real-world use case—the performance gains depend on the specific conditions.

Large file Transfers (~1 GB):#

core.ssh_async typically outperforms due to concurrent upload and download streams. In favorable network conditions, this can nearly double the effective bandwidth.

Example: On a network with a baseline of 11.8 MB/s, the asynchronous mode approached nearly twice that speed under light load (see graph in PR #6626).

Test case:
Two WorkGraphs: one uploads 1 GB, the other retrieves 1 GB using RemoteData.

  • core.ssh_async: 120 seconds

  • core.ssh: 204 seconds

Small file transfers (many small files):#

Here, the overhead of managing asynchronous operations can outweigh the benefits.

Test case:
25 WorkGraphs, each transferring several ~1 MB files.

  • core.ssh_async: 105 seconds

  • core.ssh: 65 seconds

To conclude, the choice of which transport plugin is the best bet for your use case depends on your specific application: use core.ssh_async for workloads involving large file transfers or when you need to prevent I/O operations from blocking other tasks, but stick with core.ssh for scenarios dominated by many small file transfers where the asynchronous overhead may reduce performance.

Extended dumping support for profiles and groups (#6723)#

In version v2.6.0, AiiDA introduced the ability to dump processes from the database into a human-readable, structured folder format. Building on this feature, support has now been extended to allow dumping of entire groups and profiles, enabling users to retrieve AiiDA data more easily. This enhancement is part of our broader roadmap to improve AiiDA’s usability—especially for new users—who may find it challenging to construct the appropriate queries to extract data from the database manually. The functionality is accessible via the verdi CLI:

verdi profile dump --all          # This dumps the whole current profile
verdi profile dump --groups <PK>  # This dumps one selected group as part of the profile dumping operation
verdi group dump <PK>             # This dumps only the selected group, disregarding other profile data

Since dumping an entire profile can be a resource- and I/O-intensive operation (for large profiles), significant effort has been made to provide flexible options for fine-tuning which nodes are included in the dump.[2] Below is a snippet from the command’s help output:

Usage: verdi profile dump [OPTIONS] [--]

  Dump all data in an AiiDA profiles storage to disk.

Options:
  -p, --path PATH                 Base path for dump operations that write to
                                  disk.
  -n, --dry-run                   Perform a dry run.
  -o, --overwrite                 Overwrite file/directory when writing to
                                  disk.
  -a, --all                       Include all entries, disregarding all other
                                  filter options and flags.
  -X, --codes CODE...             One or multiple codes identified by their
                                  ID, UUID or label.
  -Y, --computers COMPUTER...     One or multiple computers identified by
                                  their ID, UUID or label.
  -G, --groups GROUP...           One or multiple groups identified by their
                                  ID, UUID or label.
  -u, --user USER                 Email address of the user.
  -p, --past-days PAST_DAYS       Only include entries created in the last
                                  PAST_DAYS number of days.
  --start-date TEXT               Start date for node mtime range selection
                                  for node collection dumping.
  --end-date TEXT                 End date for node mtime range selection for
                                  node collection dumping.
  --filter-by-last-dump-time / --no-filter-by-last-dump-time
                                  Only select nodes whose mtime is after the
                                  last dump time.  [default: filter-by-last-
                                  dump-time]
  --only-top-level-calcs / --no-only-top-level-calcs
                                  Dump calculations in their own dedicated
                                  directories, not just as part of the dumped
                                  workflow.  [default: only-top-level-calcs]
  --only-top-level-workflows / --no-only-top-level-workflows
                                  If a top-level workflow calls sub-workflows,
                                  create a designated directory only for the
                                  top-level workflow.  [default: only-top-
                                  level-workflows]
  --delete-missing / --no-delete-missing
                                  If a previously dumped group or node is
                                  deleted from the DB, delete the
                                  corresponding dump directory.  [default:
                                  delete-missing]
  --symlink-calcs / --no-symlink-calcs
                                  Symlink workflow sub-calculations to their
                                  own dedicated directories.  [default: no-
                                  symlink-calcs]
  --organize-by-groups / --no-organize-by-groups
                                  If the collection of nodes to be dumped is
                                  organized in groups, reproduce its
                                  hierarchy.  [default: organize-by-groups]
  --also-ungrouped / --no-also-ungrouped
                                  Dump also data of nodes that are not part of
                                  any group.  [default: no-also-ungrouped]
  --relabel-groups / --no-relabel-groups
                                  Update directories and log entries for the
                                  dumping if groups have been relabeled since
                                  the last dump.  [default: relabel-groups]
  --include-inputs / --exclude-inputs
                                  Include linked input nodes of
                                  `CalculationNode`(s).  [default: include-
                                  inputs]
  --include-outputs / --exclude-outputs
                                  Include linked output nodes of
                                  `CalculationNode`(s).  [default: exclude-
                                  outputs]
  --include-attributes / --exclude-attributes
                                  Include attributes in the
                                  `aiida_node_metadata.yaml` written for every
                                  `ProcessNode`.  [default: include-
                                  attributes]
  --include-extras / --exclude-extras
                                  Include extras in the
                                  `aiida_node_metadata.yaml` written for every
                                  `ProcessNode`.  [default: exclude-extras]
  -f, --flat                      Dump files in a flat directory for every
                                  step of a workflow.
  --dump-unsealed / --no-dump-unsealed
                                  Also allow the dumping of unsealed process
                                  nodes.  [default: no-dump-unsealed]
  -v, --verbosity [notset|debug|info|report|warning|error|critical]
                                  Set the verbosity of the output.
  -h, --help                      Show this message and exit.

Another key feature is the incremental nature of the command, which ensures that the dumping process synchronizes the output folder with the internal state of AiiDA’s DB by gradually adding or removing files on successive executions of the command. This allows for efficient updates without having to overwrite everything, and is in contrast to AiiDA archive creation, which is a one-shot process. The behavior can further be adjusted using:

  • --dry-run (-n): to simulate the dump without writing any files.

  • --overwrite (-o): to fully overwrite the target directory if it already exists.

Finally, the command provides various options to customize the output folder structure, for instance, to reflect the group hierarchy of AiiDA’s internal DB state, symlink duplicate calculations (e.g., which are contained in multiple groups), create dedicated directories for sub-workflows and calculations of top-level workflows, and more.

These enhancements aim to make data export from AiiDA more robust, customizable, and user-friendly.

Stashing (#6746, #6772)#

With this feature, you can bundle your data to a (compressed) tar archive during stashing by specifying one of the stash_mode options "tar", "tar.bz2", "tar.gz", or "tar.xz". When specifying the stashing operation during the setup of your calculation, compression can be configured as follows:

from aiida.plugins import CalculationFactory
from aiida.engine import run
from aiida.common import StashMode
from aiida.orm import load_computer

inputs = {
    ...,
    'metadata': {
        'computer': load_computer(label="localhost"),
        'options': {
            'resources': {'num_machines': 1},
            'stash': {
                'stash_mode':  StashMode.COMPRESS_TARGZ,
                'target_base': '/scratch/',
                'source_list': ['heavy_data.xyz'],  # ['*'] to stash everything
            },
        },
    },
}
# If you use a builder, use
# builder.metadata = {'options': {...}, ...}

run(MyCalculation, **inputs)

In addition, it was historically only possible to enable stashing when it was instructed before running a generic CalcJob. This means that the instruction had to be “attached” to the original CalcJob before its execcution. However, if a user would realize they need to stash something only after running the calculation, this would not be possible. With v2.7.0, we introduce the new StashCalculation CalcJob which is able to perform a stashing operation after a calculation has finished—provenance included! The usage is very similar, and for consistency and user-friendliness, we keep the instructions as part of the metadata. The only main input is the remote_folder output node (an instance of RemoteData) of the calculation source node to be stashed, for example:

from aiida.plugins import CalculationFactory
from aiida.engine import run
from aiida.common import StashMode
from aiida.orm import load_node

StashCalculation = CalculationFactory('core.stash')

calcjob_node = load_node(<CALCJOB_PK>)
inputs = {
    'metadata': {
        'computer': calcjob_node.computer,
        'options': {
            'resources': {'num_machines': 1},
            'stash': {
                'stash_mode':  StashMode.COPY.value,
                'target_base': '/scratch/',
                'source_list': ['heavy_data.xyz'],
            },
        },
    },
    'source_node': calcjob_node.outputs.remote_folder,
}

result = run(StashCalculation, **inputs)

Forcefully killing processes (#6793)#

Prior to version v2.7.0, the verdi process kill command could hang if a connection to the remote computer could not be established. A new --force option has been introduced to terminate a process without waiting for a response from the remote machine.
Note: Using --force may result in orphaned jobs on the remote system if the remote job cancellation fails.

verdi process kill --force <PROCESS_ID>

We also now cancel the old killing action if it is resend by the user. This allows the user to adapt the parameters for the exponential backoff mechanism (EBM) applied by AiiDA in the verdi config and then resend the kill command with the new parameters.

verdi process kill --timeout 5 <PROCESS_ID>
verdi config set transport.task_maximum_attempts 1
verdi config set transport.task_retry_initial_interval 5
verdi daemon restart
verdi process kill <PROCESS_ID>

Furthermore, the timeout and wait options were not behaving correctly, so they are now fixed and both merged into the single timeout option. By passing --timeout 0 it replicates the --no-wait functionality, meaning the command does not block until the action has finished, and by passing --timeout inf (default option, replicating --wait without a timeout), the command blocks until a response. For more information see issue #6524.

Serialization of ORM nodes (#6723)#

AiiDA’s Python API provides an object relational mapper (ORM) that abstracts the various entities that can be stored inside the provenance graph (via the SQL database) and the relationships between them. In most use cases, users use this ORM directly in Python to construct new instances of entities and retrieve existing ones, in order to get access to their data and manipulate it. A shortcoming of the current ORM is that it is not possible to programmatically introspect the schema of each entity: that is to say, what data each entity stores. This makes it difficult for external applications to provide interfaces to create and or retrieve entity instances. It also makes it difficult to take the data outside of the Python environment since the data would have to be serialized. However, without a well-defined schema, doing this without an ad-hoc solution is practically impossible.

With the implementation of a pydantic Model for each Entity we now allow external applications to programmatically determine the schema of all AiiDA ORM entities and automatically (de)serialize entity instances to and from other data formats, e.g., JSON. An example how this is done for an AiiDA integer node:

node = Int(5) # Can be any ORM node
serialized_node = node.serialize()
print(serialized_node)
# Out: {'pk': None, 'uuid': '485c2ec8-441d-484d-b7d9-374a3cdd98ae', 'node_type': 'data.core.int.Int.', 'process_type': None, 'repository_metadata': {}, 'ctime': datetime.datetime(2025, 5, 2, 10, 20, 41, 275443, tzinfo=datetime.timezone(datetime.timedelta(seconds=7200), 'CEST')), 'mtime': None, 'label': '', 'description': '', 'attributes': {'value': 5}, 'extras': {}, 'computer': None, 'user': 1, 'repository_content': {}, 'source': None, 'value': 5}
uuid: 77e9c19a-5ecb-40cf-8238-ea5c55fbb83f (unstored) value: 5
node_deserialized = Int.from_serialized(**serialized_node)
print(node_deserialized)
# Out: uuid: 77e9c19a-5ecb-40cf-8238-ea5c55fbb83f (unstored) value: 5

For an extensive overview of the implications see AEP 010.

Miscellaneous#

  • aiida-core is now compatible with Python 3.13 #6600

  • Improved Windows support #6715

  • RemoteData extended by member function get_size_on_disk #6584

  • SinglefileData extended by constructor from_bytes #6653

  • Allow zero memory specification for SLURM #6605

  • Add filters to verdi group delete #6556

  • verdi storage maintain shows a progress bar #6562

  • New transport endpoints compress & extract #6743

  • Implementation of missing SQLite endpoints (en route to full feature parity between PostgreSQL and SQLite):