Common gotchas with fairchem#

OutOfMemoryError#

If you see errors like:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 390.00 MiB (GPU 0; 10.76 GiB total capacity; 9.59 GiB already allocated; 170.06 MiB free; 9.81 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

It means your GPU is out of memory. Some reasons could be that you have multiple notebooks open that are using the GPU, e.g. they have loaded a calculator or something. Try closing all the other notebooks.

It could also mean the batch size is too large to fit in memory. You can try making it smaller in the yml config file (optim.batch_size).

It is recommended you use automatic mixed precision, –amp, in the options to main.py, or in the config.yml.

If it is an option, you can try a GPU with more memory, or you may be able to split the job over multiple GPUs.

I want the energy of a gas phase atom#

But I get an error like

RuntimeError: cannot reshape tensor of 0 elements into shape [0, -1] because the unspecified dimension size -1 can be any value and is ambiguous

The problem here is that no neighbors are found for the single atom which causes an error. This may be model dependent. There is currently no way to get atomic energies for some models.

from fairchem.core.common.relaxation.ase_utils import OCPCalculator
from fairchem.core.models.model_registry import model_name_to_local_file
checkpoint_path = model_name_to_local_file('GemNet-OC-S2EFS-OC20+OC22', local_cache='/tmp/fairchem_checkpoints/')
calc = OCPCalculator(checkpoint_path=checkpoint_path)
/home/runner/work/fairchem/fairchem/src/fairchem/core/models/escn/so3.py:23: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  _Jd = torch.load(os.path.join(os.path.dirname(__file__), "Jd.pt"))
/home/runner/work/fairchem/fairchem/src/fairchem/core/models/scn/spherical_harmonics.py:23: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  _Jd = torch.load(os.path.join(os.path.dirname(__file__), "Jd.pt"))
/home/runner/work/fairchem/fairchem/src/fairchem/core/models/equiformer_v2/wigner.py:10: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  _Jd = torch.load(os.path.join(os.path.dirname(__file__), "Jd.pt"))
/home/runner/work/fairchem/fairchem/src/fairchem/core/models/equiformer_v2/layer_norm.py:75: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  @torch.cuda.amp.autocast(enabled=False)
/home/runner/work/fairchem/fairchem/src/fairchem/core/models/equiformer_v2/layer_norm.py:175: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  @torch.cuda.amp.autocast(enabled=False)
/home/runner/work/fairchem/fairchem/src/fairchem/core/models/equiformer_v2/layer_norm.py:263: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  @torch.cuda.amp.autocast(enabled=False)
/home/runner/work/fairchem/fairchem/src/fairchem/core/models/equiformer_v2/layer_norm.py:357: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  @torch.cuda.amp.autocast(enabled=False)
/home/runner/work/fairchem/fairchem/src/fairchem/core/common/relaxation/ase_utils.py:150: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  checkpoint = torch.load(checkpoint_path, map_location=torch.device("cpu"))
WARNING:root:Detected old config, converting to new format. Consider updating to avoid potential incompatibilities.
INFO:root:amp: true
cmd:
  checkpoint_dir: /home/runner/work/fairchem/fairchem/docs/core/checkpoints/2024-11-19-06-21-52
  commit: aa298ac
  identifier: ''
  logs_dir: /home/runner/work/fairchem/fairchem/docs/core/logs/wandb/2024-11-19-06-21-52
  print_every: 100
  results_dir: /home/runner/work/fairchem/fairchem/docs/core/results/2024-11-19-06-21-52
  seed: null
  timestamp_id: 2024-11-19-06-21-52
  version: 0.1.dev1+gaa298ac
dataset:
  format: oc22_lmdb
  key_mapping:
    force: forces
    y: energy
  normalize_labels: false
  oc20_ref: /checkpoint/janlan/ocp/other_data/final_ref_energies_02_07_2021.pkl
  raw_energy_target: true
evaluation_metrics:
  metrics:
    energy:
    - mae
    forces:
    - forcesx_mae
    - forcesy_mae
    - forcesz_mae
    - mae
    - cosine_similarity
    - magnitude_error
    misc:
    - energy_forces_within_threshold
  primary_metric: forces_mae
gp_gpus: null
gpus: 0
logger: wandb
loss_functions:
- energy:
    coefficient: 1
    fn: mae
- forces:
    coefficient: 1
    fn: l2mae
model:
  activation: silu
  atom_edge_interaction: true
  atom_interaction: true
  cbf:
    name: spherical_harmonics
  cutoff: 12.0
  cutoff_aeaint: 12.0
  cutoff_aint: 12.0
  cutoff_qint: 12.0
  direct_forces: true
  edge_atom_interaction: true
  emb_size_aint_in: 64
  emb_size_aint_out: 64
  emb_size_atom: 256
  emb_size_cbf: 16
  emb_size_edge: 512
  emb_size_quad_in: 32
  emb_size_quad_out: 32
  emb_size_rbf: 16
  emb_size_sbf: 32
  emb_size_trip_in: 64
  emb_size_trip_out: 64
  envelope:
    exponent: 5
    name: polynomial
  extensive: true
  forces_coupled: false
  max_neighbors: 30
  max_neighbors_aeaint: 20
  max_neighbors_aint: 1000
  max_neighbors_qint: 8
  name: gemnet_oc
  num_after_skip: 2
  num_atom: 3
  num_atom_emb_layers: 2
  num_before_skip: 2
  num_blocks: 4
  num_concat: 1
  num_global_out_layers: 2
  num_output_afteratom: 3
  num_radial: 128
  num_spherical: 7
  otf_graph: true
  output_init: HeOrthogonal
  qint_tags:
  - 1
  - 2
  quad_interaction: true
  rbf:
    name: gaussian
  regress_forces: true
  sbf:
    name: legendre_outer
  symmetric_edge_symmetrization: false
optim:
  batch_size: 16
  clip_grad_norm: 10
  ema_decay: 0.999
  energy_coefficient: 1
  eval_batch_size: 16
  eval_every: 5000
  factor: 0.8
  force_coefficient: 1
  load_balancing: atoms
  loss_energy: mae
  loss_force: atomwisel2
  lr_initial: 0.0005
  max_epochs: 80
  mode: min
  num_workers: 2
  optimizer: AdamW
  optimizer_params:
    amsgrad: true
  patience: 3
  scheduler: ReduceLROnPlateau
  weight_decay: 0
outputs:
  energy:
    level: system
  forces:
    eval_on_free_atoms: true
    level: atom
    train_on_free_atoms: true
relax_dataset: {}
slurm:
  additional_parameters:
    constraint: volta32gb
  cpus_per_task: 3
  folder: /checkpoint/abhshkdz/ocp_oct1_logs/57632342
  gpus_per_node: 8
  job_id: '57632342'
  job_name: gnoc_oc22_oc20_all_s2ef
  mem: 480GB
  nodes: 8
  ntasks_per_node: 8
  partition: ocp,learnaccel
  time: 4320
task:
  dataset: oc22_lmdb
  description: Regressing to energies and forces for DFT trajectories from OCP
  eval_on_free_atoms: true
  grad_input: atomic forces
  labels:
  - potential energy
  metric: mae
  primary_metric: forces_mae
  train_on_free_atoms: true
  type: regression
test_dataset: {}
trainer: ocp
val_dataset: {}
INFO:root:Loading model: gemnet_oc
WARNING:root:Unrecognized arguments: ['symmetric_edge_symmetrization']
INFO:root:Loaded GemNetOC with 38864438 parameters.
INFO:root:Loading checkpoint in inference-only mode, not loading keys associated with trainer state!
INFO:root:Overwriting scaling factors with those loaded from checkpoint. If you're generating predictions with a pretrained checkpoint, this is the correct behavior. To disable this, delete `scale_dict` from the checkpoint. 
WARNING:root:No seed has been set in modelcheckpoint or OCPCalculator! Results may not be reproducible on re-run
%%capture
from ase.build import bulk
atoms = bulk('Cu', a=10)
atoms.set_calculator(calc)
atoms.get_potential_energy()
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[2], line 4
      2 atoms = bulk('Cu', a=10)
      3 atoms.set_calculator(calc)
----> 4 atoms.get_potential_energy()

File /opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/ase/atoms.py:755, in Atoms.get_potential_energy(self, force_consistent, apply_constraint)
    752     energy = self._calc.get_potential_energy(
    753         self, force_consistent=force_consistent)
    754 else:
--> 755     energy = self._calc.get_potential_energy(self)
    756 if apply_constraint:
    757     for constraint in self.constraints:

File /opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/ase/calculators/abc.py:24, in GetPropertiesMixin.get_potential_energy(self, atoms, force_consistent)
     22 else:
     23     name = 'energy'
---> 24 return self.get_property(name, atoms)

File /opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/ase/calculators/calculator.py:538, in BaseCalculator.get_property(self, name, atoms, allow_calculation)
    535     if self.use_cache:
    536         self.atoms = atoms.copy()
--> 538     self.calculate(atoms, [name], system_changes)
    540 if name not in self.results:
    541     # For some reason the calculator was not able to do what we want,
    542     # and that is OK.
    543     raise PropertyNotImplementedError(
    544         '{} not present in this ' 'calculation'.format(name)
    545     )

File ~/work/fairchem/fairchem/src/fairchem/core/common/relaxation/ase_utils.py:233, in OCPCalculator.calculate(self, atoms, properties, system_changes)
    230 data_object = self.a2g.convert(atoms)
    231 batch = data_list_collater([data_object], otf_graph=True)
--> 233 predictions = self.trainer.predict(batch, per_image=False, disable_tqdm=True)
    235 for key in predictions:
    236     _pred = predictions[key]

File /opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/torch/utils/_contextlib.py:116, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    113 @functools.wraps(func)
    114 def decorate_context(*args, **kwargs):
    115     with ctx_factory():
--> 116         return func(*args, **kwargs)

File ~/work/fairchem/fairchem/src/fairchem/core/trainers/ocp_trainer.py:462, in OCPTrainer.predict(self, data_loader, per_image, results_file, disable_tqdm)
    454 for _, batch in tqdm(
    455     enumerate(data_loader),
    456     total=len(data_loader),
   (...)
    459     disable=disable_tqdm,
    460 ):
    461     with torch.cuda.amp.autocast(enabled=self.scaler is not None):
--> 462         out = self._forward(batch)
    464     for target_key in self.config["outputs"]:
    465         pred = self._denorm_preds(target_key, out[target_key], batch)

File ~/work/fairchem/fairchem/src/fairchem/core/trainers/ocp_trainer.py:245, in OCPTrainer._forward(self, batch)
    244 def _forward(self, batch):
--> 245     out = self.model(batch.to(self.device))
    247     outputs = {}
    248     batch_size = batch.natoms.numel()

File /opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/torch/nn/modules/module.py:1553, in Module._wrapped_call_impl(self, *args, **kwargs)
   1551     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1552 else:
-> 1553     return self._call_impl(*args, **kwargs)

File /opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/torch/nn/modules/module.py:1562, in Module._call_impl(self, *args, **kwargs)
   1557 # If we don't have any hooks, we want to skip the rest of the logic in
   1558 # this function, and just call forward.
   1559 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1560         or _global_backward_pre_hooks or _global_backward_hooks
   1561         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1562     return forward_call(*args, **kwargs)
   1564 try:
   1565     result = None

File ~/work/fairchem/fairchem/src/fairchem/core/common/utils.py:176, in conditional_grad.<locals>.decorator.<locals>.cls_method(self, *args, **kwargs)
    174 if self.regress_forces and not getattr(self, "direct_forces", 0):
    175     f = dec(func)
--> 176 return f(self, *args, **kwargs)

File ~/work/fairchem/fairchem/src/fairchem/core/models/gemnet_oc/gemnet_oc.py:1218, in GemNetOC.forward(self, data)
   1196 (
   1197     main_graph,
   1198     a2a_graph,
   (...)
   1205     quad_idx,
   1206 ) = self.get_graphs_and_indices(data)
   1207 _, idx_t = main_graph["edge_index"]
   1209 (
   1210     basis_rad_raw,
   1211     basis_atom_update,
   1212     basis_output,
   1213     bases_qint,
   1214     bases_e2e,
   1215     bases_a2e,
   1216     bases_e2a,
   1217     basis_a2a_rad,
-> 1218 ) = self.get_bases(
   1219     main_graph=main_graph,
   1220     a2a_graph=a2a_graph,
   1221     a2ee2a_graph=a2ee2a_graph,
   1222     qint_graph=qint_graph,
   1223     trip_idx_e2e=trip_idx_e2e,
   1224     trip_idx_a2e=trip_idx_a2e,
   1225     trip_idx_e2a=trip_idx_e2a,
   1226     quad_idx=quad_idx,
   1227     num_atoms=num_atoms,
   1228 )
   1230 # Embedding block
   1231 h = self.atom_emb(atomic_numbers)

File ~/work/fairchem/fairchem/src/fairchem/core/models/gemnet_oc/gemnet_oc.py:1091, in GemNetOC.get_bases(self, main_graph, a2a_graph, a2ee2a_graph, qint_graph, trip_idx_e2e, trip_idx_a2e, trip_idx_e2a, quad_idx, num_atoms)
   1082     cosφ_cab_q, cosφ_abd, angle_cabd = self.calculate_quad_angles(
   1083         main_graph["vector"],
   1084         qint_graph["vector"],
   1085         quad_idx,
   1086     )
   1088     basis_rad_cir_qint_raw, basis_cir_qint_raw = self.cbf_basis_qint(
   1089         qint_graph["distance"], cosφ_abd
   1090     )
-> 1091     basis_rad_sph_qint_raw, basis_sph_qint_raw = self.sbf_basis_qint(
   1092         main_graph["distance"],
   1093         cosφ_cab_q[quad_idx["trip_out_to_quad"]],
   1094         angle_cabd,
   1095     )
   1096 if self.atom_edge_interaction:
   1097     basis_rad_a2ee2a_raw = self.radial_basis_aeaint(a2ee2a_graph["distance"])

File /opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/torch/nn/modules/module.py:1553, in Module._wrapped_call_impl(self, *args, **kwargs)
   1551     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1552 else:
-> 1553     return self._call_impl(*args, **kwargs)

File /opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/torch/nn/modules/module.py:1562, in Module._call_impl(self, *args, **kwargs)
   1557 # If we don't have any hooks, we want to skip the rest of the logic in
   1558 # this function, and just call forward.
   1559 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1560         or _global_backward_pre_hooks or _global_backward_hooks
   1561         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1562     return forward_call(*args, **kwargs)
   1564 try:
   1565     result = None

File ~/work/fairchem/fairchem/src/fairchem/core/models/gemnet_oc/layers/spherical_basis.py:132, in SphericalBasisLayer.forward(self, D_ca, cosφ_cab, θ_cabd)
    130 def forward(self, D_ca, cosφ_cab, θ_cabd):
    131     rad_basis = self.radial_basis(D_ca)
--> 132     sph_basis = self.spherical_basis(cosφ_cab, θ_cabd)
    133     # (num_quadruplets, num_spherical**2)
    135     if self.scale_basis:

File ~/work/fairchem/fairchem/src/fairchem/core/models/gemnet_oc/layers/spherical_basis.py:116, in SphericalBasisLayer.__init__.<locals>.<lambda>(cosφ, θ)
    111 elif sbf_name == "legendre_outer":
    112     circular_basis = get_sph_harm_basis(num_spherical, zero_m_only=True)
    113     self.spherical_basis = lambda cosφ, ϑ: (
    114         circular_basis(cosφ)[:, :, None]
    115         * circular_basis(torch.cos(ϑ))[:, None, :]
--> 116     ).reshape(cosφ.shape[0], -1)
    118 elif sbf_name == "gaussian_outer":
    119     self.circular_basis = GaussianBasis(
    120         start=-1, stop=1, num_gaussians=num_spherical, **sbf_hparams
    121     )

RuntimeError: cannot reshape tensor of 0 elements into shape [0, -1] because the unspecified dimension size -1 can be any value and is ambiguous

I get wildly different energies from the different models#

Some models are trained on adsorption energies, and some are trained on total energies. You have to know which one you are using.

Sometimes you can tell by the magnitude of energies, but you should use care with this. If energies are “small” and near zero they are likely adsorption energies. If energies are “large” in magnitude they are probably total energies. This can be misleading though, as it depends on the total number of atoms in the systems.

# These are to suppress the output from making the calculators.
from io import StringIO
import contextlib
from ase.build import fcc111, add_adsorbate
slab = fcc111('Pt', size=(2, 2, 5), vacuum=10.0)
add_adsorbate(slab, 'O', height=1.2, position='fcc')
from fairchem.core.models.model_registry import model_name_to_local_file

# OC20 model - trained on adsorption energies
checkpoint_path = model_name_to_local_file('GemNet-OC-S2EF-OC20-All', local_cache='/tmp/fairchem_checkpoints/')

with contextlib.redirect_stdout(StringIO()) as _:
    calc = OCPCalculator(checkpoint_path=checkpoint_path, cpu=False)



slab.set_calculator(calc)
slab.get_potential_energy()
INFO:root:Checking local cache: /tmp/fairchem_checkpoints/ for model GemNet-OC-S2EF-OC20-All
/home/runner/work/fairchem/fairchem/src/fairchem/core/common/relaxation/ase_utils.py:150: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  checkpoint = torch.load(checkpoint_path, map_location=torch.device("cpu"))
WARNING:root:Detected old config, converting to new format. Consider updating to avoid potential incompatibilities.
INFO:root:amp: true
cmd:
  checkpoint_dir: /home/runner/work/fairchem/fairchem/docs/core/checkpoints/2024-11-19-06-21-52
  commit: aa298ac
  identifier: ''
  logs_dir: /home/runner/work/fairchem/fairchem/docs/core/logs/wandb/2024-11-19-06-21-52
  print_every: 100
  results_dir: /home/runner/work/fairchem/fairchem/docs/core/results/2024-11-19-06-21-52
  seed: null
  timestamp_id: 2024-11-19-06-21-52
  version: 0.1.dev1+gaa298ac
dataset:
  format: trajectory_lmdb
  grad_target_mean: 0.0
  grad_target_std: 2.887317180633545
  key_mapping:
    force: forces
    y: energy
  normalize_labels: true
  target_mean: -0.7554450631141663
  target_std: 2.887317180633545
  transforms:
    normalizer:
      energy:
        mean: -0.7554450631141663
        stdev: 2.887317180633545
      forces:
        mean: 0.0
        stdev: 2.887317180633545
evaluation_metrics:
  metrics:
    energy:
    - mae
    forces:
    - forcesx_mae
    - forcesy_mae
    - forcesz_mae
    - mae
    - cosine_similarity
    - magnitude_error
    misc:
    - energy_forces_within_threshold
  primary_metric: forces_mae
gp_gpus: null
gpus: 0
logger: wandb
loss_functions:
- energy:
    coefficient: 1
    fn: mae
- forces:
    coefficient: 100
    fn: l2mae
model:
  activation: silu
  atom_edge_interaction: true
  atom_interaction: true
  cbf:
    name: spherical_harmonics
  cutoff: 12.0
  cutoff_aeaint: 12.0
  cutoff_aint: 12.0
  cutoff_qint: 12.0
  direct_forces: true
  edge_atom_interaction: true
  emb_size_aint_in: 64
  emb_size_aint_out: 64
  emb_size_atom: 256
  emb_size_cbf: 16
  emb_size_edge: 512
  emb_size_quad_in: 32
  emb_size_quad_out: 32
  emb_size_rbf: 16
  emb_size_sbf: 32
  emb_size_trip_in: 64
  emb_size_trip_out: 64
  envelope:
    exponent: 5
    name: polynomial
  extensive: true
  forces_coupled: false
  max_neighbors: 30
  max_neighbors_aeaint: 20
  max_neighbors_aint: 1000
  max_neighbors_qint: 8
  name: gemnet_oc
  num_after_skip: 2
  num_atom: 3
  num_atom_emb_layers: 2
  num_before_skip: 2
  num_blocks: 4
  num_concat: 1
  num_global_out_layers: 2
  num_output_afteratom: 3
  num_radial: 128
  num_spherical: 7
  otf_graph: true
  output_init: HeOrthogonal
  qint_tags:
  - 1
  - 2
  quad_interaction: true
  rbf:
    name: gaussian
  regress_forces: true
  sbf:
    name: legendre_outer
  symmetric_edge_symmetrization: false
optim:
  batch_size: 16
  clip_grad_norm: 10
  ema_decay: 0.999
  energy_coefficient: 1
  eval_batch_size: 16
  eval_every: 5000
  factor: 0.8
  force_coefficient: 100
  load_balancing: atoms
  loss_energy: mae
  loss_force: l2mae
  lr_initial: 0.0005
  max_epochs: 80
  mode: min
  num_workers: 2
  optimizer: AdamW
  optimizer_params:
    amsgrad: true
  patience: 3
  scheduler: ReduceLROnPlateau
  weight_decay: 0
outputs:
  energy:
    level: system
  forces:
    eval_on_free_atoms: true
    level: atom
    train_on_free_atoms: true
relax_dataset: {}
slurm:
  additional_parameters:
    constraint: volta32gb
  cpus_per_task: 3
  folder: /checkpoint/abhshkdz/ocp_oct1_logs/46876566
  gpus_per_node: 8
  job_id: '46876566'
  job_name: gemnet_q_all_fc100
  mem: 480GB
  nodes: 4
  ntasks_per_node: 8
  partition: learnaccel
  time: 4320
task:
  dataset: trajectory_lmdb
  description: Regressing to energies and forces for DFT trajectories from OCP
  eval_on_free_atoms: true
  grad_input: atomic forces
  labels:
  - potential energy
  metric: mae
  primary_metric: forces_mae
  train_on_free_atoms: true
  type: regression
test_dataset: {}
trainer: ocp
val_dataset: {}
INFO:root:Loading model: gemnet_oc
WARNING:root:Unrecognized arguments: ['symmetric_edge_symmetrization']
INFO:root:Loaded GemNetOC with 38864438 parameters.
INFO:root:Loading checkpoint in inference-only mode, not loading keys associated with trainer state!
INFO:root:Overwriting scaling factors with those loaded from checkpoint. If you're generating predictions with a pretrained checkpoint, this is the correct behavior. To disable this, delete `scale_dict` from the checkpoint. 
/home/runner/work/fairchem/fairchem/src/fairchem/core/modules/normalization/normalizer.py:69: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  "mean": torch.tensor(state_dict["mean"]),
WARNING:root:No seed has been set in modelcheckpoint or OCPCalculator! Results may not be reproducible on re-run
/tmp/ipykernel_2701/2356712572.py:11: DeprecationWarning: Please use atoms.calc = calc
  slab.set_calculator(calc)
/home/runner/work/fairchem/fairchem/src/fairchem/core/trainers/ocp_trainer.py:461: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with torch.cuda.amp.autocast(enabled=self.scaler is not None):
/home/runner/work/fairchem/fairchem/src/fairchem/core/models/gemnet_oc/gemnet_oc.py:1270: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with torch.cuda.amp.autocast(False):
1.2851653099060059
# An OC22 checkpoint - trained on total energy
checkpoint_path = model_name_to_local_file('GemNet-OC-S2EFS-OC20+OC22', local_cache='/tmp/fairchem_checkpoints/')

with contextlib.redirect_stdout(StringIO()) as _:
    calc = OCPCalculator(checkpoint_path=checkpoint_path, cpu=False)



slab.set_calculator(calc)
slab.get_potential_energy()
INFO:root:Checking local cache: /tmp/fairchem_checkpoints/ for model GemNet-OC-S2EFS-OC20+OC22
/home/runner/work/fairchem/fairchem/src/fairchem/core/common/relaxation/ase_utils.py:150: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  checkpoint = torch.load(checkpoint_path, map_location=torch.device("cpu"))
WARNING:root:Detected old config, converting to new format. Consider updating to avoid potential incompatibilities.
INFO:root:amp: true
cmd:
  checkpoint_dir: /home/runner/work/fairchem/fairchem/docs/core/checkpoints/2024-11-19-06-21-52
  commit: aa298ac
  identifier: ''
  logs_dir: /home/runner/work/fairchem/fairchem/docs/core/logs/wandb/2024-11-19-06-21-52
  print_every: 100
  results_dir: /home/runner/work/fairchem/fairchem/docs/core/results/2024-11-19-06-21-52
  seed: null
  timestamp_id: 2024-11-19-06-21-52
  version: 0.1.dev1+gaa298ac
dataset:
  format: oc22_lmdb
  key_mapping:
    force: forces
    y: energy
  normalize_labels: false
  oc20_ref: /checkpoint/janlan/ocp/other_data/final_ref_energies_02_07_2021.pkl
  raw_energy_target: true
evaluation_metrics:
  metrics:
    energy:
    - mae
    forces:
    - forcesx_mae
    - forcesy_mae
    - forcesz_mae
    - mae
    - cosine_similarity
    - magnitude_error
    misc:
    - energy_forces_within_threshold
  primary_metric: forces_mae
gp_gpus: null
gpus: 0
logger: wandb
loss_functions:
- energy:
    coefficient: 1
    fn: mae
- forces:
    coefficient: 1
    fn: l2mae
model:
  activation: silu
  atom_edge_interaction: true
  atom_interaction: true
  cbf:
    name: spherical_harmonics
  cutoff: 12.0
  cutoff_aeaint: 12.0
  cutoff_aint: 12.0
  cutoff_qint: 12.0
  direct_forces: true
  edge_atom_interaction: true
  emb_size_aint_in: 64
  emb_size_aint_out: 64
  emb_size_atom: 256
  emb_size_cbf: 16
  emb_size_edge: 512
  emb_size_quad_in: 32
  emb_size_quad_out: 32
  emb_size_rbf: 16
  emb_size_sbf: 32
  emb_size_trip_in: 64
  emb_size_trip_out: 64
  envelope:
    exponent: 5
    name: polynomial
  extensive: true
  forces_coupled: false
  max_neighbors: 30
  max_neighbors_aeaint: 20
  max_neighbors_aint: 1000
  max_neighbors_qint: 8
  name: gemnet_oc
  num_after_skip: 2
  num_atom: 3
  num_atom_emb_layers: 2
  num_before_skip: 2
  num_blocks: 4
  num_concat: 1
  num_global_out_layers: 2
  num_output_afteratom: 3
  num_radial: 128
  num_spherical: 7
  otf_graph: true
  output_init: HeOrthogonal
  qint_tags:
  - 1
  - 2
  quad_interaction: true
  rbf:
    name: gaussian
  regress_forces: true
  sbf:
    name: legendre_outer
  symmetric_edge_symmetrization: false
optim:
  batch_size: 16
  clip_grad_norm: 10
  ema_decay: 0.999
  energy_coefficient: 1
  eval_batch_size: 16
  eval_every: 5000
  factor: 0.8
  force_coefficient: 1
  load_balancing: atoms
  loss_energy: mae
  loss_force: atomwisel2
  lr_initial: 0.0005
  max_epochs: 80
  mode: min
  num_workers: 2
  optimizer: AdamW
  optimizer_params:
    amsgrad: true
  patience: 3
  scheduler: ReduceLROnPlateau
  weight_decay: 0
outputs:
  energy:
    level: system
  forces:
    eval_on_free_atoms: true
    level: atom
    train_on_free_atoms: true
relax_dataset: {}
slurm:
  additional_parameters:
    constraint: volta32gb
  cpus_per_task: 3
  folder: /checkpoint/abhshkdz/ocp_oct1_logs/57632342
  gpus_per_node: 8
  job_id: '57632342'
  job_name: gnoc_oc22_oc20_all_s2ef
  mem: 480GB
  nodes: 8
  ntasks_per_node: 8
  partition: ocp,learnaccel
  time: 4320
task:
  dataset: oc22_lmdb
  description: Regressing to energies and forces for DFT trajectories from OCP
  eval_on_free_atoms: true
  grad_input: atomic forces
  labels:
  - potential energy
  metric: mae
  primary_metric: forces_mae
  train_on_free_atoms: true
  type: regression
test_dataset: {}
trainer: ocp
val_dataset: {}
INFO:root:Loading model: gemnet_oc
WARNING:root:Unrecognized arguments: ['symmetric_edge_symmetrization']
INFO:root:Loaded GemNetOC with 38864438 parameters.
INFO:root:Loading checkpoint in inference-only mode, not loading keys associated with trainer state!
INFO:root:Overwriting scaling factors with those loaded from checkpoint. If you're generating predictions with a pretrained checkpoint, this is the correct behavior. To disable this, delete `scale_dict` from the checkpoint. 
WARNING:root:No seed has been set in modelcheckpoint or OCPCalculator! Results may not be reproducible on re-run
/tmp/ipykernel_2701/2054440827.py:9: DeprecationWarning: Please use atoms.calc = calc
  slab.set_calculator(calc)
/home/runner/work/fairchem/fairchem/src/fairchem/core/trainers/ocp_trainer.py:461: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with torch.cuda.amp.autocast(enabled=self.scaler is not None):
/home/runner/work/fairchem/fairchem/src/fairchem/core/models/gemnet_oc/gemnet_oc.py:1270: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with torch.cuda.amp.autocast(False):
-110.40040588378906
# This eSCN model is trained on adsorption energies
checkpoint_path = model_name_to_local_file('eSCN-L4-M2-Lay12-S2EF-OC20-2M', local_cache='/tmp/fairchem_checkpoints/')

with contextlib.redirect_stdout(StringIO()) as _:
    calc = OCPCalculator(checkpoint_path=checkpoint_path, cpu=False)

slab.set_calculator(calc)
slab.get_potential_energy()
INFO:root:Checking local cache: /tmp/fairchem_checkpoints/ for model eSCN-L4-M2-Lay12-S2EF-OC20-2M
/home/runner/work/fairchem/fairchem/src/fairchem/core/common/relaxation/ase_utils.py:150: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  checkpoint = torch.load(checkpoint_path, map_location=torch.device("cpu"))
WARNING:root:Detected old config, converting to new format. Consider updating to avoid potential incompatibilities.
INFO:root:amp: true
cmd:
  checkpoint_dir: /home/runner/work/fairchem/fairchem/docs/core/checkpoints/2024-11-19-06-21-52
  commit: aa298ac
  identifier: ''
  logs_dir: /home/runner/work/fairchem/fairchem/docs/core/logs/wandb/2024-11-19-06-21-52
  print_every: 100
  results_dir: /home/runner/work/fairchem/fairchem/docs/core/results/2024-11-19-06-21-52
  seed: null
  timestamp_id: 2024-11-19-06-21-52
  version: 0.1.dev1+gaa298ac
dataset:
  format: trajectory_lmdb
  grad_target_mean: 0.0
  grad_target_std: 2.887317180633545
  key_mapping:
    force: forces
    y: energy
  normalize_labels: true
  target_mean: -0.7554450631141663
  target_std: 2.887317180633545
  transforms:
    normalizer:
      energy:
        mean: -0.7554450631141663
        stdev: 2.887317180633545
      forces:
        mean: 0.0
        stdev: 2.887317180633545
evaluation_metrics:
  metrics:
    energy:
    - mae
    forces:
    - forcesx_mae
    - forcesy_mae
    - forcesz_mae
    - mae
    - cosine_similarity
    - magnitude_error
    misc:
    - energy_forces_within_threshold
  primary_metric: forces_mae
gp_gpus: null
gpus: 0
logger: wandb
loss_functions:
- energy:
    coefficient: 2
    fn: mae
- forces:
    coefficient: 100
    fn: l2mae
model:
  basis_width_scalar: 2.0
  cutoff: 12.0
  distance_function: gaussian
  hidden_channels: 256
  lmax_list:
  - 4
  max_neighbors: 20
  mmax_list:
  - 2
  name: escn
  num_layers: 12
  num_sphere_samples: 128
  otf_graph: true
  regress_forces: true
  sphere_channels: 128
  use_pbc: true
optim:
  batch_size: 6
  clip_grad_norm: 20
  ema_decay: 0.999
  energy_coefficient: 2
  eval_batch_size: 6
  eval_every: 5000
  force_coefficient: 100
  loss_energy: mae
  loss_force: l2mae
  lr_gamma: 0.3
  lr_initial: 0.0008
  lr_milestones:
  - 145833
  - 187500
  - 229166
  max_epochs: 12
  num_workers: 8
  optimizer: AdamW
  optimizer_params:
    amsgrad: true
  warmup_factor: 0.2
  warmup_steps: 100
outputs:
  energy:
    level: system
  forces:
    eval_on_free_atoms: true
    level: atom
    train_on_free_atoms: true
relax_dataset: {}
slurm:
  cpus_per_task: 9
  folder: /checkpoint/zitnick/ocp_logs/3710525
  gpus_per_node: 8
  job_id: '3710525'
  job_name: eSCN-L4-M2-Lay12-2M
  mem: 480GB
  nodes: 2
  ntasks_per_node: 8
  partition: learnaccel
  time: 4320
task:
  dataset: trajectory_lmdb
  description: Regressing to energies and forces for DFT trajectories from OCP
  eval_on_free_atoms: true
  grad_input: atomic forces
  labels:
  - potential energy
  metric: mae
  primary_metric: forces_mae
  relax_dataset:
    src: /checkpoint/electrocatalysis/relaxations/features/init_to_relaxed/test/id/data.lmdb
  relax_opt:
    alpha: 70.0
    damping: 1.0
    maxstep: 0.04
    memory: 50
    name: lbfgs
    traj_dir: /checkpoint/zitnick/ocp/mloutputs/scn_relaxations/SCNF72-6-lay12/val_id/
  relaxation_steps: 200
  train_on_free_atoms: true
  type: regression
  write_pos: true
test_dataset: {}
trainer: ocp
val_dataset: {}
INFO:root:Loading model: escn
INFO:root:Loaded eSCN with 36112896 parameters.
INFO:root:Loading checkpoint in inference-only mode, not loading keys associated with trainer state!
/home/runner/work/fairchem/fairchem/src/fairchem/core/modules/normalization/normalizer.py:69: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  "mean": torch.tensor(state_dict["mean"]),
WARNING:root:No seed has been set in modelcheckpoint or OCPCalculator! Results may not be reproducible on re-run
/tmp/ipykernel_2701/1817216860.py:7: DeprecationWarning: Please use atoms.calc = calc
  slab.set_calculator(calc)
/home/runner/work/fairchem/fairchem/src/fairchem/core/trainers/ocp_trainer.py:461: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with torch.cuda.amp.autocast(enabled=self.scaler is not None):
1.6825480461120605

Miscellaneous warnings#

In general, warnings are not errors.

Unrecognized arguments#

With Gemnet models you might see warnings like:

WARNING:root:Unrecognized arguments: ['symmetric_edge_symmetrization']

You can ignore this warning, it is not important for predictions.

Unable to identify ocp trainer#

The trainer is not specified in some checkpoints, and defaults to forces which means energy and forces are calculated. This is the default for the ASE OCP calculator, and this warning just alerts you it is setting that.

WARNING:root:Unable to identify ocp trainer, defaulting to `forces`. Specify the `trainer` argument into OCPCalculator if otherwise.

Request entity too large - can’t save your Notebook#

If you run commands that generate a lot of output in a notebook, sometimes the Jupyter notebook will become too large to save. It is kind of sad, the only thing I know to do is delete the output of the cell. Then maybe you can save it.

A solution after you know it happens is redirect output to a file.

This has happened when running training in a notebook where there are too many lines of output, or if you have a lot (20+) of inline images.

You need at least four atoms for molecules with some models#

Gemnet in particular seems to require at least 4 atoms. This has to do with interactions between atoms and their neighbors.

%%capture
from fairchem.core.common.relaxation.ase_utils import OCPCalculator
from fairchem.core.models.model_registry import model_name_to_local_file
import os

checkpoint_path = model_name_to_local_file('GemNet-OC-S2EFS-OC20+OC22', local_cache='/tmp/fairchem_checkpoints/')

calc = OCPCalculator(checkpoint_path=checkpoint_path)
INFO:root:Checking local cache: /tmp/fairchem_checkpoints/ for model GemNet-OC-S2EFS-OC20+OC22
WARNING:root:Detected old config, converting to new format. Consider updating to avoid potential incompatibilities.
INFO:root:amp: true
cmd:
  checkpoint_dir: /home/runner/work/fairchem/fairchem/docs/core/checkpoints/2024-11-19-06-21-52
  commit: aa298ac
  identifier: ''
  logs_dir: /home/runner/work/fairchem/fairchem/docs/core/logs/wandb/2024-11-19-06-21-52
  print_every: 100
  results_dir: /home/runner/work/fairchem/fairchem/docs/core/results/2024-11-19-06-21-52
  seed: null
  timestamp_id: 2024-11-19-06-21-52
  version: 0.1.dev1+gaa298ac
dataset:
  format: oc22_lmdb
  key_mapping:
    force: forces
    y: energy
  normalize_labels: false
  oc20_ref: /checkpoint/janlan/ocp/other_data/final_ref_energies_02_07_2021.pkl
  raw_energy_target: true
evaluation_metrics:
  metrics:
    energy:
    - mae
    forces:
    - forcesx_mae
    - forcesy_mae
    - forcesz_mae
    - mae
    - cosine_similarity
    - magnitude_error
    misc:
    - energy_forces_within_threshold
  primary_metric: forces_mae
gp_gpus: null
gpus: 0
logger: wandb
loss_functions:
- energy:
    coefficient: 1
    fn: mae
- forces:
    coefficient: 1
    fn: l2mae
model:
  activation: silu
  atom_edge_interaction: true
  atom_interaction: true
  cbf:
    name: spherical_harmonics
  cutoff: 12.0
  cutoff_aeaint: 12.0
  cutoff_aint: 12.0
  cutoff_qint: 12.0
  direct_forces: true
  edge_atom_interaction: true
  emb_size_aint_in: 64
  emb_size_aint_out: 64
  emb_size_atom: 256
  emb_size_cbf: 16
  emb_size_edge: 512
  emb_size_quad_in: 32
  emb_size_quad_out: 32
  emb_size_rbf: 16
  emb_size_sbf: 32
  emb_size_trip_in: 64
  emb_size_trip_out: 64
  envelope:
    exponent: 5
    name: polynomial
  extensive: true
  forces_coupled: false
  max_neighbors: 30
  max_neighbors_aeaint: 20
  max_neighbors_aint: 1000
  max_neighbors_qint: 8
  name: gemnet_oc
  num_after_skip: 2
  num_atom: 3
  num_atom_emb_layers: 2
  num_before_skip: 2
  num_blocks: 4
  num_concat: 1
  num_global_out_layers: 2
  num_output_afteratom: 3
  num_radial: 128
  num_spherical: 7
  otf_graph: true
  output_init: HeOrthogonal
  qint_tags:
  - 1
  - 2
  quad_interaction: true
  rbf:
    name: gaussian
  regress_forces: true
  sbf:
    name: legendre_outer
  symmetric_edge_symmetrization: false
optim:
  batch_size: 16
  clip_grad_norm: 10
  ema_decay: 0.999
  energy_coefficient: 1
  eval_batch_size: 16
  eval_every: 5000
  factor: 0.8
  force_coefficient: 1
  load_balancing: atoms
  loss_energy: mae
  loss_force: atomwisel2
  lr_initial: 0.0005
  max_epochs: 80
  mode: min
  num_workers: 2
  optimizer: AdamW
  optimizer_params:
    amsgrad: true
  patience: 3
  scheduler: ReduceLROnPlateau
  weight_decay: 0
outputs:
  energy:
    level: system
  forces:
    eval_on_free_atoms: true
    level: atom
    train_on_free_atoms: true
relax_dataset: {}
slurm:
  additional_parameters:
    constraint: volta32gb
  cpus_per_task: 3
  folder: /checkpoint/abhshkdz/ocp_oct1_logs/57632342
  gpus_per_node: 8
  job_id: '57632342'
  job_name: gnoc_oc22_oc20_all_s2ef
  mem: 480GB
  nodes: 8
  ntasks_per_node: 8
  partition: ocp,learnaccel
  time: 4320
task:
  dataset: oc22_lmdb
  description: Regressing to energies and forces for DFT trajectories from OCP
  eval_on_free_atoms: true
  grad_input: atomic forces
  labels:
  - potential energy
  metric: mae
  primary_metric: forces_mae
  train_on_free_atoms: true
  type: regression
test_dataset: {}
trainer: ocp
val_dataset: {}
INFO:root:Loading model: gemnet_oc
WARNING:root:Unrecognized arguments: ['symmetric_edge_symmetrization']
INFO:root:Loaded GemNetOC with 38864438 parameters.
INFO:root:Loading checkpoint in inference-only mode, not loading keys associated with trainer state!
INFO:root:Overwriting scaling factors with those loaded from checkpoint. If you're generating predictions with a pretrained checkpoint, this is the correct behavior. To disable this, delete `scale_dict` from the checkpoint. 
WARNING:root:No seed has been set in modelcheckpoint or OCPCalculator! Results may not be reproducible on re-run
%%capture
from ase.build import molecule
import numpy as np

atoms = molecule('H2O')
atoms.set_tags(np.ones(len(atoms)))
atoms.set_calculator(calc)
atoms.get_potential_energy()
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[9], line 7
      5 atoms.set_tags(np.ones(len(atoms)))
      6 atoms.set_calculator(calc)
----> 7 atoms.get_potential_energy()

File /opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/ase/atoms.py:755, in Atoms.get_potential_energy(self, force_consistent, apply_constraint)
    752     energy = self._calc.get_potential_energy(
    753         self, force_consistent=force_consistent)
    754 else:
--> 755     energy = self._calc.get_potential_energy(self)
    756 if apply_constraint:
    757     for constraint in self.constraints:

File /opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/ase/calculators/abc.py:24, in GetPropertiesMixin.get_potential_energy(self, atoms, force_consistent)
     22 else:
     23     name = 'energy'
---> 24 return self.get_property(name, atoms)

File /opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/ase/calculators/calculator.py:538, in BaseCalculator.get_property(self, name, atoms, allow_calculation)
    535     if self.use_cache:
    536         self.atoms = atoms.copy()
--> 538     self.calculate(atoms, [name], system_changes)
    540 if name not in self.results:
    541     # For some reason the calculator was not able to do what we want,
    542     # and that is OK.
    543     raise PropertyNotImplementedError(
    544         '{} not present in this ' 'calculation'.format(name)
    545     )

File ~/work/fairchem/fairchem/src/fairchem/core/common/relaxation/ase_utils.py:233, in OCPCalculator.calculate(self, atoms, properties, system_changes)
    230 data_object = self.a2g.convert(atoms)
    231 batch = data_list_collater([data_object], otf_graph=True)
--> 233 predictions = self.trainer.predict(batch, per_image=False, disable_tqdm=True)
    235 for key in predictions:
    236     _pred = predictions[key]

File /opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/torch/utils/_contextlib.py:116, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    113 @functools.wraps(func)
    114 def decorate_context(*args, **kwargs):
    115     with ctx_factory():
--> 116         return func(*args, **kwargs)

File ~/work/fairchem/fairchem/src/fairchem/core/trainers/ocp_trainer.py:462, in OCPTrainer.predict(self, data_loader, per_image, results_file, disable_tqdm)
    454 for _, batch in tqdm(
    455     enumerate(data_loader),
    456     total=len(data_loader),
   (...)
    459     disable=disable_tqdm,
    460 ):
    461     with torch.cuda.amp.autocast(enabled=self.scaler is not None):
--> 462         out = self._forward(batch)
    464     for target_key in self.config["outputs"]:
    465         pred = self._denorm_preds(target_key, out[target_key], batch)

File ~/work/fairchem/fairchem/src/fairchem/core/trainers/ocp_trainer.py:245, in OCPTrainer._forward(self, batch)
    244 def _forward(self, batch):
--> 245     out = self.model(batch.to(self.device))
    247     outputs = {}
    248     batch_size = batch.natoms.numel()

File /opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/torch/nn/modules/module.py:1553, in Module._wrapped_call_impl(self, *args, **kwargs)
   1551     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1552 else:
-> 1553     return self._call_impl(*args, **kwargs)

File /opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/torch/nn/modules/module.py:1562, in Module._call_impl(self, *args, **kwargs)
   1557 # If we don't have any hooks, we want to skip the rest of the logic in
   1558 # this function, and just call forward.
   1559 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1560         or _global_backward_pre_hooks or _global_backward_hooks
   1561         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1562     return forward_call(*args, **kwargs)
   1564 try:
   1565     result = None

File ~/work/fairchem/fairchem/src/fairchem/core/common/utils.py:176, in conditional_grad.<locals>.decorator.<locals>.cls_method(self, *args, **kwargs)
    174 if self.regress_forces and not getattr(self, "direct_forces", 0):
    175     f = dec(func)
--> 176 return f(self, *args, **kwargs)

File ~/work/fairchem/fairchem/src/fairchem/core/models/gemnet_oc/gemnet_oc.py:1218, in GemNetOC.forward(self, data)
   1196 (
   1197     main_graph,
   1198     a2a_graph,
   (...)
   1205     quad_idx,
   1206 ) = self.get_graphs_and_indices(data)
   1207 _, idx_t = main_graph["edge_index"]
   1209 (
   1210     basis_rad_raw,
   1211     basis_atom_update,
   1212     basis_output,
   1213     bases_qint,
   1214     bases_e2e,
   1215     bases_a2e,
   1216     bases_e2a,
   1217     basis_a2a_rad,
-> 1218 ) = self.get_bases(
   1219     main_graph=main_graph,
   1220     a2a_graph=a2a_graph,
   1221     a2ee2a_graph=a2ee2a_graph,
   1222     qint_graph=qint_graph,
   1223     trip_idx_e2e=trip_idx_e2e,
   1224     trip_idx_a2e=trip_idx_a2e,
   1225     trip_idx_e2a=trip_idx_e2a,
   1226     quad_idx=quad_idx,
   1227     num_atoms=num_atoms,
   1228 )
   1230 # Embedding block
   1231 h = self.atom_emb(atomic_numbers)

File ~/work/fairchem/fairchem/src/fairchem/core/models/gemnet_oc/gemnet_oc.py:1091, in GemNetOC.get_bases(self, main_graph, a2a_graph, a2ee2a_graph, qint_graph, trip_idx_e2e, trip_idx_a2e, trip_idx_e2a, quad_idx, num_atoms)
   1082     cosφ_cab_q, cosφ_abd, angle_cabd = self.calculate_quad_angles(
   1083         main_graph["vector"],
   1084         qint_graph["vector"],
   1085         quad_idx,
   1086     )
   1088     basis_rad_cir_qint_raw, basis_cir_qint_raw = self.cbf_basis_qint(
   1089         qint_graph["distance"], cosφ_abd
   1090     )
-> 1091     basis_rad_sph_qint_raw, basis_sph_qint_raw = self.sbf_basis_qint(
   1092         main_graph["distance"],
   1093         cosφ_cab_q[quad_idx["trip_out_to_quad"]],
   1094         angle_cabd,
   1095     )
   1096 if self.atom_edge_interaction:
   1097     basis_rad_a2ee2a_raw = self.radial_basis_aeaint(a2ee2a_graph["distance"])

File /opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/torch/nn/modules/module.py:1553, in Module._wrapped_call_impl(self, *args, **kwargs)
   1551     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1552 else:
-> 1553     return self._call_impl(*args, **kwargs)

File /opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/torch/nn/modules/module.py:1562, in Module._call_impl(self, *args, **kwargs)
   1557 # If we don't have any hooks, we want to skip the rest of the logic in
   1558 # this function, and just call forward.
   1559 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1560         or _global_backward_pre_hooks or _global_backward_hooks
   1561         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1562     return forward_call(*args, **kwargs)
   1564 try:
   1565     result = None

File ~/work/fairchem/fairchem/src/fairchem/core/models/gemnet_oc/layers/spherical_basis.py:132, in SphericalBasisLayer.forward(self, D_ca, cosφ_cab, θ_cabd)
    130 def forward(self, D_ca, cosφ_cab, θ_cabd):
    131     rad_basis = self.radial_basis(D_ca)
--> 132     sph_basis = self.spherical_basis(cosφ_cab, θ_cabd)
    133     # (num_quadruplets, num_spherical**2)
    135     if self.scale_basis:

File ~/work/fairchem/fairchem/src/fairchem/core/models/gemnet_oc/layers/spherical_basis.py:116, in SphericalBasisLayer.__init__.<locals>.<lambda>(cosφ, θ)
    111 elif sbf_name == "legendre_outer":
    112     circular_basis = get_sph_harm_basis(num_spherical, zero_m_only=True)
    113     self.spherical_basis = lambda cosφ, ϑ: (
    114         circular_basis(cosφ)[:, :, None]
    115         * circular_basis(torch.cos(ϑ))[:, None, :]
--> 116     ).reshape(cosφ.shape[0], -1)
    118 elif sbf_name == "gaussian_outer":
    119     self.circular_basis = GaussianBasis(
    120         start=-1, stop=1, num_gaussians=num_spherical, **sbf_hparams
    121     )

RuntimeError: cannot reshape tensor of 0 elements into shape [0, -1] because the unspecified dimension size -1 can be any value and is ambiguous

To tag or not?#

Some models use tags to determine which atoms to calculate energies for. For example, Gemnet uses a tag=1 to indicate the atom should be calculated. You will get an error with this model

%%capture
from fairchem.core.common.relaxation.ase_utils import OCPCalculator
from fairchem.core.models.model_registry import model_name_to_local_file
import os

checkpoint_path = model_name_to_local_file('GemNet-OC-S2EFS-OC20+OC22', local_cache='/tmp/fairchem_checkpoints/')
calc = OCPCalculator(checkpoint_path=checkpoint_path)
INFO:root:Checking local cache: /tmp/fairchem_checkpoints/ for model GemNet-OC-S2EFS-OC20+OC22
WARNING:root:Detected old config, converting to new format. Consider updating to avoid potential incompatibilities.
INFO:root:amp: true
cmd:
  checkpoint_dir: /home/runner/work/fairchem/fairchem/docs/core/checkpoints/2024-11-19-06-21-52
  commit: aa298ac
  identifier: ''
  logs_dir: /home/runner/work/fairchem/fairchem/docs/core/logs/wandb/2024-11-19-06-21-52
  print_every: 100
  results_dir: /home/runner/work/fairchem/fairchem/docs/core/results/2024-11-19-06-21-52
  seed: null
  timestamp_id: 2024-11-19-06-21-52
  version: 0.1.dev1+gaa298ac
dataset:
  format: oc22_lmdb
  key_mapping:
    force: forces
    y: energy
  normalize_labels: false
  oc20_ref: /checkpoint/janlan/ocp/other_data/final_ref_energies_02_07_2021.pkl
  raw_energy_target: true
evaluation_metrics:
  metrics:
    energy:
    - mae
    forces:
    - forcesx_mae
    - forcesy_mae
    - forcesz_mae
    - mae
    - cosine_similarity
    - magnitude_error
    misc:
    - energy_forces_within_threshold
  primary_metric: forces_mae
gp_gpus: null
gpus: 0
logger: wandb
loss_functions:
- energy:
    coefficient: 1
    fn: mae
- forces:
    coefficient: 1
    fn: l2mae
model:
  activation: silu
  atom_edge_interaction: true
  atom_interaction: true
  cbf:
    name: spherical_harmonics
  cutoff: 12.0
  cutoff_aeaint: 12.0
  cutoff_aint: 12.0
  cutoff_qint: 12.0
  direct_forces: true
  edge_atom_interaction: true
  emb_size_aint_in: 64
  emb_size_aint_out: 64
  emb_size_atom: 256
  emb_size_cbf: 16
  emb_size_edge: 512
  emb_size_quad_in: 32
  emb_size_quad_out: 32
  emb_size_rbf: 16
  emb_size_sbf: 32
  emb_size_trip_in: 64
  emb_size_trip_out: 64
  envelope:
    exponent: 5
    name: polynomial
  extensive: true
  forces_coupled: false
  max_neighbors: 30
  max_neighbors_aeaint: 20
  max_neighbors_aint: 1000
  max_neighbors_qint: 8
  name: gemnet_oc
  num_after_skip: 2
  num_atom: 3
  num_atom_emb_layers: 2
  num_before_skip: 2
  num_blocks: 4
  num_concat: 1
  num_global_out_layers: 2
  num_output_afteratom: 3
  num_radial: 128
  num_spherical: 7
  otf_graph: true
  output_init: HeOrthogonal
  qint_tags:
  - 1
  - 2
  quad_interaction: true
  rbf:
    name: gaussian
  regress_forces: true
  sbf:
    name: legendre_outer
  symmetric_edge_symmetrization: false
optim:
  batch_size: 16
  clip_grad_norm: 10
  ema_decay: 0.999
  energy_coefficient: 1
  eval_batch_size: 16
  eval_every: 5000
  factor: 0.8
  force_coefficient: 1
  load_balancing: atoms
  loss_energy: mae
  loss_force: atomwisel2
  lr_initial: 0.0005
  max_epochs: 80
  mode: min
  num_workers: 2
  optimizer: AdamW
  optimizer_params:
    amsgrad: true
  patience: 3
  scheduler: ReduceLROnPlateau
  weight_decay: 0
outputs:
  energy:
    level: system
  forces:
    eval_on_free_atoms: true
    level: atom
    train_on_free_atoms: true
relax_dataset: {}
slurm:
  additional_parameters:
    constraint: volta32gb
  cpus_per_task: 3
  folder: /checkpoint/abhshkdz/ocp_oct1_logs/57632342
  gpus_per_node: 8
  job_id: '57632342'
  job_name: gnoc_oc22_oc20_all_s2ef
  mem: 480GB
  nodes: 8
  ntasks_per_node: 8
  partition: ocp,learnaccel
  time: 4320
task:
  dataset: oc22_lmdb
  description: Regressing to energies and forces for DFT trajectories from OCP
  eval_on_free_atoms: true
  grad_input: atomic forces
  labels:
  - potential energy
  metric: mae
  primary_metric: forces_mae
  train_on_free_atoms: true
  type: regression
test_dataset: {}
trainer: ocp
val_dataset: {}
INFO:root:Loading model: gemnet_oc
WARNING:root:Unrecognized arguments: ['symmetric_edge_symmetrization']
INFO:root:Loaded GemNetOC with 38864438 parameters.
INFO:root:Loading checkpoint in inference-only mode, not loading keys associated with trainer state!
INFO:root:Overwriting scaling factors with those loaded from checkpoint. If you're generating predictions with a pretrained checkpoint, this is the correct behavior. To disable this, delete `scale_dict` from the checkpoint. 
WARNING:root:No seed has been set in modelcheckpoint or OCPCalculator! Results may not be reproducible on re-run
%%capture
atoms = molecule('CH4')
atoms.set_calculator(calc)
atoms.get_potential_energy()  # error
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[11], line 3
      1 atoms = molecule('CH4')
      2 atoms.set_calculator(calc)
----> 3 atoms.get_potential_energy()  # error

File /opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/ase/atoms.py:755, in Atoms.get_potential_energy(self, force_consistent, apply_constraint)
    752     energy = self._calc.get_potential_energy(
    753         self, force_consistent=force_consistent)
    754 else:
--> 755     energy = self._calc.get_potential_energy(self)
    756 if apply_constraint:
    757     for constraint in self.constraints:

File /opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/ase/calculators/abc.py:24, in GetPropertiesMixin.get_potential_energy(self, atoms, force_consistent)
     22 else:
     23     name = 'energy'
---> 24 return self.get_property(name, atoms)

File /opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/ase/calculators/calculator.py:538, in BaseCalculator.get_property(self, name, atoms, allow_calculation)
    535     if self.use_cache:
    536         self.atoms = atoms.copy()
--> 538     self.calculate(atoms, [name], system_changes)
    540 if name not in self.results:
    541     # For some reason the calculator was not able to do what we want,
    542     # and that is OK.
    543     raise PropertyNotImplementedError(
    544         '{} not present in this ' 'calculation'.format(name)
    545     )

File ~/work/fairchem/fairchem/src/fairchem/core/common/relaxation/ase_utils.py:233, in OCPCalculator.calculate(self, atoms, properties, system_changes)
    230 data_object = self.a2g.convert(atoms)
    231 batch = data_list_collater([data_object], otf_graph=True)
--> 233 predictions = self.trainer.predict(batch, per_image=False, disable_tqdm=True)
    235 for key in predictions:
    236     _pred = predictions[key]

File /opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/torch/utils/_contextlib.py:116, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    113 @functools.wraps(func)
    114 def decorate_context(*args, **kwargs):
    115     with ctx_factory():
--> 116         return func(*args, **kwargs)

File ~/work/fairchem/fairchem/src/fairchem/core/trainers/ocp_trainer.py:462, in OCPTrainer.predict(self, data_loader, per_image, results_file, disable_tqdm)
    454 for _, batch in tqdm(
    455     enumerate(data_loader),
    456     total=len(data_loader),
   (...)
    459     disable=disable_tqdm,
    460 ):
    461     with torch.cuda.amp.autocast(enabled=self.scaler is not None):
--> 462         out = self._forward(batch)
    464     for target_key in self.config["outputs"]:
    465         pred = self._denorm_preds(target_key, out[target_key], batch)

File ~/work/fairchem/fairchem/src/fairchem/core/trainers/ocp_trainer.py:245, in OCPTrainer._forward(self, batch)
    244 def _forward(self, batch):
--> 245     out = self.model(batch.to(self.device))
    247     outputs = {}
    248     batch_size = batch.natoms.numel()

File /opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/torch/nn/modules/module.py:1553, in Module._wrapped_call_impl(self, *args, **kwargs)
   1551     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1552 else:
-> 1553     return self._call_impl(*args, **kwargs)

File /opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/torch/nn/modules/module.py:1562, in Module._call_impl(self, *args, **kwargs)
   1557 # If we don't have any hooks, we want to skip the rest of the logic in
   1558 # this function, and just call forward.
   1559 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1560         or _global_backward_pre_hooks or _global_backward_hooks
   1561         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1562     return forward_call(*args, **kwargs)
   1564 try:
   1565     result = None

File ~/work/fairchem/fairchem/src/fairchem/core/common/utils.py:176, in conditional_grad.<locals>.decorator.<locals>.cls_method(self, *args, **kwargs)
    174 if self.regress_forces and not getattr(self, "direct_forces", 0):
    175     f = dec(func)
--> 176 return f(self, *args, **kwargs)

File ~/work/fairchem/fairchem/src/fairchem/core/models/gemnet_oc/gemnet_oc.py:1218, in GemNetOC.forward(self, data)
   1196 (
   1197     main_graph,
   1198     a2a_graph,
   (...)
   1205     quad_idx,
   1206 ) = self.get_graphs_and_indices(data)
   1207 _, idx_t = main_graph["edge_index"]
   1209 (
   1210     basis_rad_raw,
   1211     basis_atom_update,
   1212     basis_output,
   1213     bases_qint,
   1214     bases_e2e,
   1215     bases_a2e,
   1216     bases_e2a,
   1217     basis_a2a_rad,
-> 1218 ) = self.get_bases(
   1219     main_graph=main_graph,
   1220     a2a_graph=a2a_graph,
   1221     a2ee2a_graph=a2ee2a_graph,
   1222     qint_graph=qint_graph,
   1223     trip_idx_e2e=trip_idx_e2e,
   1224     trip_idx_a2e=trip_idx_a2e,
   1225     trip_idx_e2a=trip_idx_e2a,
   1226     quad_idx=quad_idx,
   1227     num_atoms=num_atoms,
   1228 )
   1230 # Embedding block
   1231 h = self.atom_emb(atomic_numbers)

File ~/work/fairchem/fairchem/src/fairchem/core/models/gemnet_oc/gemnet_oc.py:1091, in GemNetOC.get_bases(self, main_graph, a2a_graph, a2ee2a_graph, qint_graph, trip_idx_e2e, trip_idx_a2e, trip_idx_e2a, quad_idx, num_atoms)
   1082     cosφ_cab_q, cosφ_abd, angle_cabd = self.calculate_quad_angles(
   1083         main_graph["vector"],
   1084         qint_graph["vector"],
   1085         quad_idx,
   1086     )
   1088     basis_rad_cir_qint_raw, basis_cir_qint_raw = self.cbf_basis_qint(
   1089         qint_graph["distance"], cosφ_abd
   1090     )
-> 1091     basis_rad_sph_qint_raw, basis_sph_qint_raw = self.sbf_basis_qint(
   1092         main_graph["distance"],
   1093         cosφ_cab_q[quad_idx["trip_out_to_quad"]],
   1094         angle_cabd,
   1095     )
   1096 if self.atom_edge_interaction:
   1097     basis_rad_a2ee2a_raw = self.radial_basis_aeaint(a2ee2a_graph["distance"])

File /opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/torch/nn/modules/module.py:1553, in Module._wrapped_call_impl(self, *args, **kwargs)
   1551     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1552 else:
-> 1553     return self._call_impl(*args, **kwargs)

File /opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/torch/nn/modules/module.py:1562, in Module._call_impl(self, *args, **kwargs)
   1557 # If we don't have any hooks, we want to skip the rest of the logic in
   1558 # this function, and just call forward.
   1559 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1560         or _global_backward_pre_hooks or _global_backward_hooks
   1561         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1562     return forward_call(*args, **kwargs)
   1564 try:
   1565     result = None

File ~/work/fairchem/fairchem/src/fairchem/core/models/gemnet_oc/layers/spherical_basis.py:132, in SphericalBasisLayer.forward(self, D_ca, cosφ_cab, θ_cabd)
    130 def forward(self, D_ca, cosφ_cab, θ_cabd):
    131     rad_basis = self.radial_basis(D_ca)
--> 132     sph_basis = self.spherical_basis(cosφ_cab, θ_cabd)
    133     # (num_quadruplets, num_spherical**2)
    135     if self.scale_basis:

File ~/work/fairchem/fairchem/src/fairchem/core/models/gemnet_oc/layers/spherical_basis.py:116, in SphericalBasisLayer.__init__.<locals>.<lambda>(cosφ, θ)
    111 elif sbf_name == "legendre_outer":
    112     circular_basis = get_sph_harm_basis(num_spherical, zero_m_only=True)
    113     self.spherical_basis = lambda cosφ, ϑ: (
    114         circular_basis(cosφ)[:, :, None]
    115         * circular_basis(torch.cos(ϑ))[:, None, :]
--> 116     ).reshape(cosφ.shape[0], -1)
    118 elif sbf_name == "gaussian_outer":
    119     self.circular_basis = GaussianBasis(
    120         start=-1, stop=1, num_gaussians=num_spherical, **sbf_hparams
    121     )

RuntimeError: cannot reshape tensor of 0 elements into shape [0, -1] because the unspecified dimension size -1 can be any value and is ambiguous
atoms = molecule('CH4')
atoms.set_tags(np.ones(len(atoms)))  # <- critical line for Gemnet
atoms.set_calculator(calc)
atoms.get_potential_energy()
/tmp/ipykernel_2701/3906293788.py:3: DeprecationWarning: Please use atoms.calc = calc
  atoms.set_calculator(calc)
/home/runner/work/fairchem/fairchem/src/fairchem/core/models/gemnet_oc/gemnet_oc.py:1270: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with torch.cuda.amp.autocast(False):
-23.71796226501465

Not all models require tags though. This EquiformerV2 model does not use them. This is another detail that is important to keep in mind.

from fairchem.core.common.relaxation.ase_utils import OCPCalculator
from fairchem.core.models.model_registry import model_name_to_local_file
import os

checkpoint_path = model_name_to_local_file('EquiformerV2-31M-S2EF-OC20-All+MD', local_cache='/tmp/fairchem_checkpoints/')

calc = OCPCalculator(checkpoint_path=checkpoint_path)
INFO:root:Checking local cache: /tmp/fairchem_checkpoints/ for model EquiformerV2-31M-S2EF-OC20-All+MD
/home/runner/work/fairchem/fairchem/src/fairchem/core/common/relaxation/ase_utils.py:150: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  checkpoint = torch.load(checkpoint_path, map_location=torch.device("cpu"))
WARNING:root:Detected old config, converting to new format. Consider updating to avoid potential incompatibilities.
INFO:root:amp: true
cmd:
  checkpoint_dir: /home/runner/work/fairchem/fairchem/docs/core/checkpoints/2024-11-19-06-21-52
  commit: aa298ac
  identifier: ''
  logs_dir: /home/runner/work/fairchem/fairchem/docs/core/logs/wandb/2024-11-19-06-21-52
  print_every: 100
  results_dir: /home/runner/work/fairchem/fairchem/docs/core/results/2024-11-19-06-21-52
  seed: null
  timestamp_id: 2024-11-19-06-21-52
  version: 0.1.dev1+gaa298ac
dataset:
  format: trajectory_lmdb_v2
  grad_target_mean: 0.0
  grad_target_std: 2.887317180633545
  key_mapping:
    force: forces
    y: energy
  normalize_labels: true
  target_mean: -0.7554450631141663
  target_std: 2.887317180633545
  transforms:
    normalizer:
      energy:
        mean: -0.7554450631141663
        stdev: 2.887317180633545
      forces:
        mean: 0.0
        stdev: 2.887317180633545
evaluation_metrics:
  metrics:
    energy:
    - mae
    forces:
    - forcesx_mae
    - forcesy_mae
    - forcesz_mae
    - mae
    - cosine_similarity
    - magnitude_error
    misc:
    - energy_forces_within_threshold
  primary_metric: forces_mae
gp_gpus: null
gpus: 0
logger: wandb
loss_functions:
- energy:
    coefficient: 4
    fn: mae
- forces:
    coefficient: 100
    fn: l2mae
model:
  alpha_drop: 0.1
  attn_activation: silu
  attn_alpha_channels: 64
  attn_hidden_channels: 64
  attn_value_channels: 16
  distance_function: gaussian
  drop_path_rate: 0.1
  edge_channels: 128
  ffn_activation: silu
  ffn_hidden_channels: 128
  grid_resolution: 18
  lmax_list:
  - 4
  max_neighbors: 20
  max_num_elements: 90
  max_radius: 12.0
  mmax_list:
  - 2
  name: equiformer_v2
  norm_type: layer_norm_sh
  num_distance_basis: 512
  num_heads: 8
  num_layers: 8
  num_sphere_samples: 128
  otf_graph: true
  proj_drop: 0.0
  regress_forces: true
  sphere_channels: 128
  use_atom_edge_embedding: true
  use_gate_act: false
  use_grid_mlp: true
  use_pbc: true
  use_s2_act_attn: false
  weight_init: uniform
optim:
  batch_size: 8
  clip_grad_norm: 100
  ema_decay: 0.999
  energy_coefficient: 4
  eval_batch_size: 8
  eval_every: 10000
  force_coefficient: 100
  grad_accumulation_steps: 1
  load_balancing: atoms
  loss_energy: mae
  loss_force: l2mae
  lr_initial: 0.0004
  max_epochs: 3
  num_workers: 8
  optimizer: AdamW
  optimizer_params:
    weight_decay: 0.001
  scheduler: LambdaLR
  scheduler_params:
    epochs: 1009275
    lambda_type: cosine
    lr: 0.0004
    lr_min_factor: 0.01
    warmup_epochs: 3364.25
    warmup_factor: 0.2
outputs:
  energy:
    level: system
  forces:
    eval_on_free_atoms: true
    level: atom
    train_on_free_atoms: true
relax_dataset: {}
slurm:
  additional_parameters:
    constraint: volta32gb
  cpus_per_task: 9
  folder: /checkpoint/abhshkdz/open-catalyst-project/logs/equiformer_v2/8307793
  gpus_per_node: 8
  job_id: '8307793'
  job_name: eq2s_051701_allmd
  mem: 480GB
  nodes: 8
  ntasks_per_node: 8
  partition: learnaccel
  time: 4320
task:
  dataset: trajectory_lmdb_v2
  eval_on_free_atoms: true
  grad_input: atomic forces
  labels:
  - potential energy
  primary_metric: forces_mae
  train_on_free_atoms: true
test_dataset: {}
trainer: ocp
val_dataset: {}
INFO:root:Loading model: equiformer_v2
WARNING:root:equiformer_v2 (EquiformerV2) class is deprecated in favor of equiformer_v2_backbone_and_heads  (EquiformerV2BackboneAndHeads)
INFO:root:Loaded EquiformerV2 with 31058690 parameters.
INFO:root:Loading checkpoint in inference-only mode, not loading keys associated with trainer state!
/home/runner/work/fairchem/fairchem/src/fairchem/core/modules/normalization/normalizer.py:69: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  "mean": torch.tensor(state_dict["mean"]),
WARNING:root:No seed has been set in modelcheckpoint or OCPCalculator! Results may not be reproducible on re-run
atoms = molecule('CH4')

atoms.set_calculator(calc)
atoms.get_potential_energy()
/tmp/ipykernel_2701/4094489779.py:3: DeprecationWarning: Please use atoms.calc = calc
  atoms.set_calculator(calc)
/home/runner/work/fairchem/fairchem/src/fairchem/core/trainers/ocp_trainer.py:461: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with torch.cuda.amp.autocast(enabled=self.scaler is not None):
-0.4297364354133606

Stochastic simulation results#

Some models are not deterministic (SCN/eSCN/EqV2), i.e. you can get slightly different answers each time you run it. An example is shown below. See Issue 563 for more discussion. This happens because a random selection of is made to sample edges, and a different selection is made each time you run it.

from fairchem.core.models.model_registry import model_name_to_local_file
from fairchem.core.common.relaxation.ase_utils import OCPCalculator

checkpoint_path = model_name_to_local_file('EquiformerV2-31M-S2EF-OC20-All+MD', local_cache='/tmp/fairchem_checkpoints/')
calc = OCPCalculator(checkpoint_path=checkpoint_path, cpu=True)

from ase.build import fcc111, add_adsorbate
from ase.optimize import BFGS
slab = fcc111('Pt', size=(2, 2, 5), vacuum=10.0)
add_adsorbate(slab, 'O', height=1.2, position='fcc')
slab.set_calculator(calc)

results = []
for i in range(10):
    calc.calculate(slab, ['energy'], None)
    results += [slab.get_potential_energy()]

import numpy as np
print(np.mean(results), np.std(results))
for result in results:
    print(result)
INFO:root:Checking local cache: /tmp/fairchem_checkpoints/ for model EquiformerV2-31M-S2EF-OC20-All+MD
/home/runner/work/fairchem/fairchem/src/fairchem/core/common/relaxation/ase_utils.py:150: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  checkpoint = torch.load(checkpoint_path, map_location=torch.device("cpu"))
WARNING:root:Detected old config, converting to new format. Consider updating to avoid potential incompatibilities.
INFO:root:amp: true
cmd:
  checkpoint_dir: /home/runner/work/fairchem/fairchem/docs/core/checkpoints/2024-11-19-06-21-52
  commit: aa298ac
  identifier: ''
  logs_dir: /home/runner/work/fairchem/fairchem/docs/core/logs/wandb/2024-11-19-06-21-52
  print_every: 100
  results_dir: /home/runner/work/fairchem/fairchem/docs/core/results/2024-11-19-06-21-52
  seed: null
  timestamp_id: 2024-11-19-06-21-52
  version: 0.1.dev1+gaa298ac
dataset:
  format: trajectory_lmdb_v2
  grad_target_mean: 0.0
  grad_target_std: 2.887317180633545
  key_mapping:
    force: forces
    y: energy
  normalize_labels: true
  target_mean: -0.7554450631141663
  target_std: 2.887317180633545
  transforms:
    normalizer:
      energy:
        mean: -0.7554450631141663
        stdev: 2.887317180633545
      forces:
        mean: 0.0
        stdev: 2.887317180633545
evaluation_metrics:
  metrics:
    energy:
    - mae
    forces:
    - forcesx_mae
    - forcesy_mae
    - forcesz_mae
    - mae
    - cosine_similarity
    - magnitude_error
    misc:
    - energy_forces_within_threshold
  primary_metric: forces_mae
gp_gpus: null
gpus: 0
logger: wandb
loss_functions:
- energy:
    coefficient: 4
    fn: mae
- forces:
    coefficient: 100
    fn: l2mae
model:
  alpha_drop: 0.1
  attn_activation: silu
  attn_alpha_channels: 64
  attn_hidden_channels: 64
  attn_value_channels: 16
  distance_function: gaussian
  drop_path_rate: 0.1
  edge_channels: 128
  ffn_activation: silu
  ffn_hidden_channels: 128
  grid_resolution: 18
  lmax_list:
  - 4
  max_neighbors: 20
  max_num_elements: 90
  max_radius: 12.0
  mmax_list:
  - 2
  name: equiformer_v2
  norm_type: layer_norm_sh
  num_distance_basis: 512
  num_heads: 8
  num_layers: 8
  num_sphere_samples: 128
  otf_graph: true
  proj_drop: 0.0
  regress_forces: true
  sphere_channels: 128
  use_atom_edge_embedding: true
  use_gate_act: false
  use_grid_mlp: true
  use_pbc: true
  use_s2_act_attn: false
  weight_init: uniform
optim:
  batch_size: 8
  clip_grad_norm: 100
  ema_decay: 0.999
  energy_coefficient: 4
  eval_batch_size: 8
  eval_every: 10000
  force_coefficient: 100
  grad_accumulation_steps: 1
  load_balancing: atoms
  loss_energy: mae
  loss_force: l2mae
  lr_initial: 0.0004
  max_epochs: 3
  num_workers: 8
  optimizer: AdamW
  optimizer_params:
    weight_decay: 0.001
  scheduler: LambdaLR
  scheduler_params:
    epochs: 1009275
    lambda_type: cosine
    lr: 0.0004
    lr_min_factor: 0.01
    warmup_epochs: 3364.25
    warmup_factor: 0.2
outputs:
  energy:
    level: system
  forces:
    eval_on_free_atoms: true
    level: atom
    train_on_free_atoms: true
relax_dataset: {}
slurm:
  additional_parameters:
    constraint: volta32gb
  cpus_per_task: 9
  folder: /checkpoint/abhshkdz/open-catalyst-project/logs/equiformer_v2/8307793
  gpus_per_node: 8
  job_id: '8307793'
  job_name: eq2s_051701_allmd
  mem: 480GB
  nodes: 8
  ntasks_per_node: 8
  partition: learnaccel
  time: 4320
task:
  dataset: trajectory_lmdb_v2
  eval_on_free_atoms: true
  grad_input: atomic forces
  labels:
  - potential energy
  primary_metric: forces_mae
  train_on_free_atoms: true
test_dataset: {}
trainer: ocp
val_dataset: {}
INFO:root:Loading model: equiformer_v2
WARNING:root:equiformer_v2 (EquiformerV2) class is deprecated in favor of equiformer_v2_backbone_and_heads  (EquiformerV2BackboneAndHeads)
INFO:root:Loaded EquiformerV2 with 31058690 parameters.
INFO:root:Loading checkpoint in inference-only mode, not loading keys associated with trainer state!
/home/runner/work/fairchem/fairchem/src/fairchem/core/modules/normalization/normalizer.py:69: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  "mean": torch.tensor(state_dict["mean"]),
WARNING:root:No seed has been set in modelcheckpoint or OCPCalculator! Results may not be reproducible on re-run
/tmp/ipykernel_2701/3396863997.py:11: DeprecationWarning: Please use atoms.calc = calc
  slab.set_calculator(calc)
/home/runner/work/fairchem/fairchem/src/fairchem/core/trainers/ocp_trainer.py:461: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with torch.cuda.amp.autocast(enabled=self.scaler is not None):
1.2127599716186523 1.6757281676107005e-06
1.2127604484558105
1.212756633758545
1.2127602100372314
1.2127580642700195
1.212761402130127
1.2127594947814941
1.2127604484558105
1.212759017944336
1.2127611637115479
1.2127628326416016

The forces don’t sum to zero#

In DFT, the forces on all the atoms should sum to zero; otherwise, there is a net translational or rotational force present. This is not enforced in fairchem models. Instead, individual forces are predicted, with no constraint that they sum to zero. If the force predictions are very accurate, then they sum close to zero. You can further improve this if you subtract the mean force from each atom.

from fairchem.core.models.model_registry import model_name_to_local_file
checkpoint_path = model_name_to_local_file('EquiformerV2-31M-S2EF-OC20-All+MD', local_cache='/tmp/fairchem_checkpoints/')

from fairchem.core.common.relaxation.ase_utils import OCPCalculator
calc = OCPCalculator(checkpoint_path=checkpoint_path, cpu=True)

from ase.build import fcc111, add_adsorbate
from ase.optimize import BFGS
slab = fcc111('Pt', size=(2, 2, 5), vacuum=10.0)
add_adsorbate(slab, 'O', height=1.2, position='fcc')
slab.set_calculator(calc)

f = slab.get_forces()
f.sum(axis=0)
INFO:root:Checking local cache: /tmp/fairchem_checkpoints/ for model EquiformerV2-31M-S2EF-OC20-All+MD
/home/runner/work/fairchem/fairchem/src/fairchem/core/common/relaxation/ase_utils.py:150: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  checkpoint = torch.load(checkpoint_path, map_location=torch.device("cpu"))
WARNING:root:Detected old config, converting to new format. Consider updating to avoid potential incompatibilities.
INFO:root:amp: true
cmd:
  checkpoint_dir: /home/runner/work/fairchem/fairchem/docs/core/checkpoints/2024-11-19-06-21-52
  commit: aa298ac
  identifier: ''
  logs_dir: /home/runner/work/fairchem/fairchem/docs/core/logs/wandb/2024-11-19-06-21-52
  print_every: 100
  results_dir: /home/runner/work/fairchem/fairchem/docs/core/results/2024-11-19-06-21-52
  seed: null
  timestamp_id: 2024-11-19-06-21-52
  version: 0.1.dev1+gaa298ac
dataset:
  format: trajectory_lmdb_v2
  grad_target_mean: 0.0
  grad_target_std: 2.887317180633545
  key_mapping:
    force: forces
    y: energy
  normalize_labels: true
  target_mean: -0.7554450631141663
  target_std: 2.887317180633545
  transforms:
    normalizer:
      energy:
        mean: -0.7554450631141663
        stdev: 2.887317180633545
      forces:
        mean: 0.0
        stdev: 2.887317180633545
evaluation_metrics:
  metrics:
    energy:
    - mae
    forces:
    - forcesx_mae
    - forcesy_mae
    - forcesz_mae
    - mae
    - cosine_similarity
    - magnitude_error
    misc:
    - energy_forces_within_threshold
  primary_metric: forces_mae
gp_gpus: null
gpus: 0
logger: wandb
loss_functions:
- energy:
    coefficient: 4
    fn: mae
- forces:
    coefficient: 100
    fn: l2mae
model:
  alpha_drop: 0.1
  attn_activation: silu
  attn_alpha_channels: 64
  attn_hidden_channels: 64
  attn_value_channels: 16
  distance_function: gaussian
  drop_path_rate: 0.1
  edge_channels: 128
  ffn_activation: silu
  ffn_hidden_channels: 128
  grid_resolution: 18
  lmax_list:
  - 4
  max_neighbors: 20
  max_num_elements: 90
  max_radius: 12.0
  mmax_list:
  - 2
  name: equiformer_v2
  norm_type: layer_norm_sh
  num_distance_basis: 512
  num_heads: 8
  num_layers: 8
  num_sphere_samples: 128
  otf_graph: true
  proj_drop: 0.0
  regress_forces: true
  sphere_channels: 128
  use_atom_edge_embedding: true
  use_gate_act: false
  use_grid_mlp: true
  use_pbc: true
  use_s2_act_attn: false
  weight_init: uniform
optim:
  batch_size: 8
  clip_grad_norm: 100
  ema_decay: 0.999
  energy_coefficient: 4
  eval_batch_size: 8
  eval_every: 10000
  force_coefficient: 100
  grad_accumulation_steps: 1
  load_balancing: atoms
  loss_energy: mae
  loss_force: l2mae
  lr_initial: 0.0004
  max_epochs: 3
  num_workers: 8
  optimizer: AdamW
  optimizer_params:
    weight_decay: 0.001
  scheduler: LambdaLR
  scheduler_params:
    epochs: 1009275
    lambda_type: cosine
    lr: 0.0004
    lr_min_factor: 0.01
    warmup_epochs: 3364.25
    warmup_factor: 0.2
outputs:
  energy:
    level: system
  forces:
    eval_on_free_atoms: true
    level: atom
    train_on_free_atoms: true
relax_dataset: {}
slurm:
  additional_parameters:
    constraint: volta32gb
  cpus_per_task: 9
  folder: /checkpoint/abhshkdz/open-catalyst-project/logs/equiformer_v2/8307793
  gpus_per_node: 8
  job_id: '8307793'
  job_name: eq2s_051701_allmd
  mem: 480GB
  nodes: 8
  ntasks_per_node: 8
  partition: learnaccel
  time: 4320
task:
  dataset: trajectory_lmdb_v2
  eval_on_free_atoms: true
  grad_input: atomic forces
  labels:
  - potential energy
  primary_metric: forces_mae
  train_on_free_atoms: true
test_dataset: {}
trainer: ocp
val_dataset: {}
INFO:root:Loading model: equiformer_v2
WARNING:root:equiformer_v2 (EquiformerV2) class is deprecated in favor of equiformer_v2_backbone_and_heads  (EquiformerV2BackboneAndHeads)
INFO:root:Loaded EquiformerV2 with 31058690 parameters.
INFO:root:Loading checkpoint in inference-only mode, not loading keys associated with trainer state!
/home/runner/work/fairchem/fairchem/src/fairchem/core/modules/normalization/normalizer.py:69: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  "mean": torch.tensor(state_dict["mean"]),
WARNING:root:No seed has been set in modelcheckpoint or OCPCalculator! Results may not be reproducible on re-run
/tmp/ipykernel_2701/4037009387.py:11: DeprecationWarning: Please use atoms.calc = calc
  slab.set_calculator(calc)
/home/runner/work/fairchem/fairchem/src/fairchem/core/trainers/ocp_trainer.py:461: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with torch.cuda.amp.autocast(enabled=self.scaler is not None):
array([ 0.01599247,  0.00170968, -0.07197821], dtype=float32)
# This makes them sum closer to zero by removing net translational force
(f - f.mean(axis=0)).sum(axis=0)
array([ 7.4098352e-08, -1.2980308e-07, -1.1920929e-07], dtype=float32)