Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Inference using ASE and Predictor Interface

Inference is done using MLIPPredictUnit. The FairchemCalculator (an ASE calculator) is simply a convenience wrapper around the MLIPPredictUnit.

from __future__ import annotations

from fairchem.core import FAIRChemCalculator, pretrained_mlip

predictor = pretrained_mlip.get_predict_unit("uma-s-1p2", device="cuda")
calc = FAIRChemCalculator(predictor, task_name="oc20")
Warp DeprecationWarning: The symbol `warp.vec` will soon be removed from the public API. Use `warp.types.vector` instead.
Loading...
Loading...
Loading...

Default mode

UMA is designed for both general-purpose usage (single or batched systems) and single-system long rollout (MD simulations, relaxations, etc.). For general-purpose use, we suggest using the default settings. This is a good trade-off between accuracy, speed, and memory consumption and should suffice for most applications. In this setting, on a single 80GB H100 GPU, we expect a user should be able to compute on systems as large as 50k-100k neighbors (depending on their atomic density). Batching is also supported in this mode.

Turbo mode

For long rollout trajectory use-cases, such as molecular dynamics (MD) or relaxations, we provide a special mode called turbo, which optimizes for speed but restricts the user to using a single system where the atomic composition is held constant. Turbo mode is approximately 1.5-2x faster than default mode, depending on the situation. However, batching is not supported in this mode. It can be easily activated as shown below.

predictor = pretrained_mlip.get_predict_unit(
    "uma-s-1p2", device="cuda", inference_settings="turbo"
)

Custom modes for advanced users

The advanced user might quickly see that default mode and turbo mode are special cases of our inference settings api. You can customize it for your application if you understand what you are doing. The following table provides more information.

Setting FlagDescription
tf32enables torch tf32 format for matrix multiplication. This will speed up inference at a slight trade-off for precision. In our tests, it makes minimal difference to most applications. It is able to preserve equivariance, energy conservation for long rollouts. However, if you are computing higher order derivatives such as Hessians or other calculations that requires strict numerical precision, we recommend turning this off
activation_checkpointingthis uses a custom chunked activation checkpointing algorithm and allows significant savings in memory for a small inference speed penalty. If you are predicting on systems >1000 atoms, we recommend keeping this on. However, if you want the absolute fastest inference possible for small systems, you can turn this off
merge_moleThis is useful in long rollout applications where the system composition stays constant. By pre-merge the MoLE weights, we can save both memory and compute.
compileThis uses torch.compile to significantly speed up computation. Due to the way pytorch traces the internal graph, it requires a long compile time during the first iteration and can even recompile anytime it detected a significant change in input dimensions. It is not recommended if you are computing frequently on very different atomic systems.
external_graph_genOnly use this if you want to use an external graph generator. This should be rarely used except for development
internal_graph_gen_versioncurrently we support v2[default], an internal implementation that is better suited for parallelism and v3 the neighborlist from Nvidia Alchemi library which is faster for single gpu operations.
edge_chunk_sizeExperimental. Used for padding edge sizes. This helps reduce re-compilations from torch compile, default to None
use_quaternion_wignerenable quaternion-based Wigner D matrix computation. If false we fall back to euler-angle based rotations. default True.
base_precision_dtypegoverns the main precision type of the computation, default to FP32, FP64 is also supported
execution_modeThis allows manually toggling custom backends to maximize speed ups. default to “None”, when set to “None”, the predictor will automatically determine the best backend. For example, “umas-fast-gpu” will introduce 30-40% speedup for uma-s line of models.

For example, for an MD simulation use-case for a system of ~500 atoms, we can choose to use a custom mode like the following:

from fairchem.core.units.mlip_unit.api.inference import InferenceSettings

settings = InferenceSettings(
    tf32=True,
    activation_checkpointing=False,
    merge_mole=True,
    compile=True,
    external_graph_gen=False,
    internal_graph_gen_version=2,
)

predictor = pretrained_mlip.get_predict_unit(
    "uma-s-1p2", device="cuda", inference_settings=settings
)

Enabling gradient stress or Hessian prediction

Some tasks, for example omol, odac, or oc20/25, were not trained using stress labels. Similarly, no tasks were supervised to predict Hessians. However, predictions of untrained derivatives of energy, such as stress and Hessians, can be enabled by using the following inference settings flags,

Setting FlagDescription
predict_untrained_forcesA set of task/dataset names (e.g., {"omol", "oc20"}) for which forces will be computed via autograd even though the checkpoint was not trained with a forces head for those tasks.
predict_untrained_stressA set of task/dataset names for which stress tensors will be computed via autograd even though the checkpoint was not trained with a stress head for those tasks. The default empty set disables this.
predict_untrained_hessianA set of task/dataset names for which the Hessian matrix will be computed via autograd.

For example, to enable stress and Hessian predictions with omol level of theory, the following settings can be used,

settings = InferenceSettings(
    predict_untrained_stress={'omol'},
    predict_untrained_hessian={'omol'}
)

predictor = pretrained_mlip.get_predict_unit(
    "uma-s-1p2", device="cuda", inference_settings=settings
)

Multi-GPU Inference

UMA supports Graph Parallel inference natively. The graph is chunked into each rank and both the forward and backwards communication is handled by the built-in graph parallel algorithm with torch distributed. Because Multi-GPU inference requires special setup of communication protocols within a node and across nodes, we leverage ray to launch Ray Actors for each GPU-rank under the hood. This allows us to seamlessly scale to any infrastructure that can run Ray.

To make things simple for the user that wants to run multi-gpu inference locally, we provide a drop-in replacement for MLIPPredictUnit, called ParallelMLIPPredictUnit

pip install fairchem-core[extras]

For example, we can create a predictor with 8 GPU workers in a very similar way to MLIPPredictUnit and perform an MD calculation with the ASE calculator. This mode of operation is also compatible with our LAMMPS integration.

from ase import units
from ase.md.langevin import Langevin
from fairchem.core import pretrained_mlip, FAIRChemCalculator
import time

from fairchem.core.datasets.common_structures import get_fcc_crystal_by_num_atoms

predictor = pretrained_mlip.get_predict_unit(
    "uma-s-1p2", inference_settings="turbo", device="cuda", workers=1
)
calc = FAIRChemCalculator(predictor, task_name="omat")

atoms = get_fcc_crystal_by_num_atoms(8000)
atoms.calc = calc

dyn = Langevin(
    atoms,
    timestep=0.1 * units.fs,
    temperature_K=400,
    friction=0.001 / units.fs,
)
# warmup 10 steps
dyn.run(steps=10)
start_time = time.time()
dyn.attach(
    lambda: print(
        f"Step: {dyn.get_number_of_steps()}, E: {atoms.get_potential_energy():.3f} eV, "
        f"QPS: {dyn.get_number_of_steps()/(time.time()-start_time):.2f}"
    ),
    interval=1,
)
dyn.run(steps=1000)