Model FAQ#

If you don’t find your question answered here, please feel free to file a GitHub issue or post on the discussion board.

Models#

Are predictions from FAIRChem models deterministic?#

By deterministic, we mean that multiple calls to the same function, given the same inputs (and seed), will produce the same results.

On CPU, all operations should be deterministic. On GPU, scatter calls – which are used in the node aggregation functions to get the final energy – are non-deterministic, since the order of parallel operations is not uniquely determined [1, 2]. Moreover, results may be different between GPU and CPU executions [3].

To get deterministic results on GPU, use torch.use_deterministic_algorithms where available (for example, see scatter_det). Note that deterministic operations are often slower than non-deterministic operations, so while this may be worth using for testing and debugging, this is not recommended for large-scale training and inference.

How do I train a model on OC20 total energies?#

By default, the OC20 S2EF/IS2RE LMDBs have adsorption energies, i.e. the DFT total energies minus the clean surface and gas phase adsorbate energies.

In order to train a model on DFT total energies, set the following flags in the YAML config:

task:
    ...
    # To train on OC20 total energies, use the 'oc22_lmdb' dataset class.
    dataset: oc22_lmdb
    ...

dataset:
    train:
        ...
        # To train on OC20 total energies, a path to OC20 reference energies
        # `oc20_ref` must be specified to unreference existing data.
        train_on_oc20_total_energies: True
        oc20_ref: path/to/oc20_ref.pkl
        ...
    val:
        ...
        train_on_oc20_total_energies: True
        oc20_ref: path/to/oc20_ref.pkl
        ...

The OC20 reference pickle file containing the energy necessary to convert adsorption energy values to total energy is available for download here.

To test if your setup is correct, try the following:

from fairchem.core.datasets import OC22LmdbDataset

dset = OC22LmdbDataset({
    "src": "path/to/oc20/lmdb/folder/",
    "train_on_oc20_total_energies": True,
    "oc20_ref": "path/to/oc20_ref.pkl",
})

print(dset[0])
# Data(y=-181.54722937, ...) -- total DFT energies are usually quite high!

Another option that might be useful for training on total energies is passing precomputed per-element average energies with lin_ref. If you use this option, make sure to recompute the normalizer statistics (for energies) after linear referencing.

I’m trying to run GemNet-OC / GemNet-dT, but it throws an error that scaling factors are not fitted. What should I do?#

GemNet-OC and GemNet-dT make use of empirical scaling factors that are fit on a few batches of data prior to training in order to stabilize the variance of activations. See Sec. 6 in the GemNet paper for more details on this.

We provide some set of scaling factors as part of the fairchem codebase that you can reuse by passing the scale_file parameter in the YAML config. For example:

If you change any of the model architecture hyperparameters or the dataset, you should refit these scaling factors:

python src/fairchem/core/modules/scaling/fit.py \
    --config-yml path/to/my/config.yml \
    --checkpoint path/to/save/checkpoint.pt \
    --mode train

This will recalculate the scaling factors and save them in a checkpoint file path/to/save/checkpoint.pt, that you can then load and launch training from:

python main.py \
    --config-yml path/to/my/config.yml \
    --checkpoint path/to/save/checkpoint.pt \
    --mode train

I’m trying to run GemNet-OC on my data, but it errors out on sph_basis = self.spherical_basis(cosφ_cab, θ_cabd).#

This is likely a tagging issue – GemNet-OC computes quadruplet interactions for atoms tagged as 1 and 2 (see code). In OC20 parlance, tag==1 refers to surface atoms and tag==2 refers to adsorbate atoms. If all the atoms are tagged as 0 (check atoms.get_tags()), no quadruplets are computed, and part of the GemNet-OC forward pass fails. Having some atoms tagged as 1 or 2 in your structure should fix it.