Common gotchas with fairchem#
OutOfMemoryError#
If you see errors like:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 390.00 MiB (GPU 0; 10.76 GiB total capacity; 9.59 GiB already allocated; 170.06 MiB free; 9.81 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
It means your GPU is out of memory. Some reasons could be that you have multiple notebooks open that are using the GPU, e.g. they have loaded a calculator or something. Try closing all the other notebooks.
It could also mean the batch size is too large to fit in memory. You can try making it smaller in the yml config file (optim.batch_size).
It is recommended you use automatic mixed precision, –amp, in the options to main.py, or in the config.yml.
If it is an option, you can try a GPU with more memory, or you may be able to split the job over multiple GPUs.
I want the energy of a gas phase atom#
But I get an error like
RuntimeError: cannot reshape tensor of 0 elements into shape [0, -1] because the unspecified dimension size -1 can be any value and is ambiguous
The problem here is that no neighbors are found for the single atom which causes an error. This may be model dependent. There is currently no way to get atomic energies for some models.
from fairchem.core.common.relaxation.ase_utils import OCPCalculator
from fairchem.core.models.model_registry import model_name_to_local_file
checkpoint_path = model_name_to_local_file('GemNet-OC-S2EFS-OC20+OC22', local_cache='/tmp/fairchem_checkpoints/')
calc = OCPCalculator(checkpoint_path=checkpoint_path)
/home/runner/work/fairchem/fairchem/src/fairchem/core/models/escn/so3.py:23: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
_Jd = torch.load(os.path.join(os.path.dirname(__file__), "Jd.pt"))
/home/runner/work/fairchem/fairchem/src/fairchem/core/models/scn/spherical_harmonics.py:23: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
_Jd = torch.load(os.path.join(os.path.dirname(__file__), "Jd.pt"))
/home/runner/work/fairchem/fairchem/src/fairchem/core/models/equiformer_v2/wigner.py:10: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
_Jd = torch.load(os.path.join(os.path.dirname(__file__), "Jd.pt"))
/home/runner/work/fairchem/fairchem/src/fairchem/core/models/equiformer_v2/layer_norm.py:75: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
@torch.cuda.amp.autocast(enabled=False)
/home/runner/work/fairchem/fairchem/src/fairchem/core/models/equiformer_v2/layer_norm.py:175: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
@torch.cuda.amp.autocast(enabled=False)
/home/runner/work/fairchem/fairchem/src/fairchem/core/models/equiformer_v2/layer_norm.py:263: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
@torch.cuda.amp.autocast(enabled=False)
/home/runner/work/fairchem/fairchem/src/fairchem/core/models/equiformer_v2/layer_norm.py:357: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
@torch.cuda.amp.autocast(enabled=False)
/home/runner/work/fairchem/fairchem/src/fairchem/core/common/relaxation/ase_utils.py:191: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
checkpoint = torch.load(checkpoint_path, map_location=torch.device("cpu"))
WARNING:root:Detected old config, converting to new format. Consider updating to avoid potential incompatibilities.
INFO:root:amp: true
cmd:
checkpoint_dir: /home/runner/work/fairchem/fairchem/docs/core/checkpoints/2024-12-19-04-01-04
commit: 83e1a53
identifier: ''
logs_dir: /home/runner/work/fairchem/fairchem/docs/core/logs/wandb/2024-12-19-04-01-04
print_every: 100
results_dir: /home/runner/work/fairchem/fairchem/docs/core/results/2024-12-19-04-01-04
seed: null
timestamp_id: 2024-12-19-04-01-04
version: 1.4.0
dataset:
format: oc22_lmdb
key_mapping:
force: forces
y: energy
normalize_labels: false
oc20_ref: /checkpoint/janlan/ocp/other_data/final_ref_energies_02_07_2021.pkl
raw_energy_target: true
evaluation_metrics:
metrics:
energy:
- mae
forces:
- forcesx_mae
- forcesy_mae
- forcesz_mae
- mae
- cosine_similarity
- magnitude_error
misc:
- energy_forces_within_threshold
primary_metric: forces_mae
gp_gpus: null
gpus: 0
logger: wandb
loss_functions:
- energy:
coefficient: 1
fn: mae
- forces:
coefficient: 1
fn: l2mae
model:
activation: silu
atom_edge_interaction: true
atom_interaction: true
cbf:
name: spherical_harmonics
cutoff: 12.0
cutoff_aeaint: 12.0
cutoff_aint: 12.0
cutoff_qint: 12.0
direct_forces: true
edge_atom_interaction: true
emb_size_aint_in: 64
emb_size_aint_out: 64
emb_size_atom: 256
emb_size_cbf: 16
emb_size_edge: 512
emb_size_quad_in: 32
emb_size_quad_out: 32
emb_size_rbf: 16
emb_size_sbf: 32
emb_size_trip_in: 64
emb_size_trip_out: 64
envelope:
exponent: 5
name: polynomial
extensive: true
forces_coupled: false
max_neighbors: 30
max_neighbors_aeaint: 20
max_neighbors_aint: 1000
max_neighbors_qint: 8
name: gemnet_oc
num_after_skip: 2
num_atom: 3
num_atom_emb_layers: 2
num_before_skip: 2
num_blocks: 4
num_concat: 1
num_global_out_layers: 2
num_output_afteratom: 3
num_radial: 128
num_spherical: 7
otf_graph: true
output_init: HeOrthogonal
qint_tags:
- 1
- 2
quad_interaction: true
rbf:
name: gaussian
regress_forces: true
sbf:
name: legendre_outer
symmetric_edge_symmetrization: false
optim:
batch_size: 16
clip_grad_norm: 10
ema_decay: 0.999
energy_coefficient: 1
eval_batch_size: 16
eval_every: 5000
factor: 0.8
force_coefficient: 1
load_balancing: atoms
loss_energy: mae
loss_force: atomwisel2
lr_initial: 0.0005
max_epochs: 80
mode: min
num_workers: 2
optimizer: AdamW
optimizer_params:
amsgrad: true
patience: 3
scheduler: ReduceLROnPlateau
weight_decay: 0
outputs:
energy:
level: system
forces:
eval_on_free_atoms: true
level: atom
train_on_free_atoms: true
relax_dataset: {}
slurm:
additional_parameters:
constraint: volta32gb
cpus_per_task: 3
folder: /checkpoint/abhshkdz/ocp_oct1_logs/57632342
gpus_per_node: 8
job_id: '57632342'
job_name: gnoc_oc22_oc20_all_s2ef
mem: 480GB
nodes: 8
ntasks_per_node: 8
partition: ocp,learnaccel
time: 4320
task:
dataset: oc22_lmdb
description: Regressing to energies and forces for DFT trajectories from OCP
eval_on_free_atoms: true
grad_input: atomic forces
labels:
- potential energy
metric: mae
primary_metric: forces_mae
train_on_free_atoms: true
type: regression
test_dataset: {}
trainer: ocp
val_dataset: {}
INFO:root:Loading model: gemnet_oc
WARNING:root:Unrecognized arguments: ['symmetric_edge_symmetrization']
INFO:root:Loaded GemNetOC with 38864438 parameters.
INFO:root:Loading checkpoint in inference-only mode, not loading keys associated with trainer state!
INFO:root:Overwriting scaling factors with those loaded from checkpoint. If you're generating predictions with a pretrained checkpoint, this is the correct behavior. To disable this, delete `scale_dict` from the checkpoint.
WARNING:root:No seed has been set in modelcheckpoint or OCPCalculator! Results may not be reproducible on re-run
%%capture
from ase.build import bulk
atoms = bulk('Cu', a=10)
atoms.set_calculator(calc)
atoms.get_potential_energy()
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[2], line 4
2 atoms = bulk('Cu', a=10)
3 atoms.set_calculator(calc)
----> 4 atoms.get_potential_energy()
File /opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/site-packages/ase/atoms.py:755, in Atoms.get_potential_energy(self, force_consistent, apply_constraint)
752 energy = self._calc.get_potential_energy(
753 self, force_consistent=force_consistent)
754 else:
--> 755 energy = self._calc.get_potential_energy(self)
756 if apply_constraint:
757 for constraint in self.constraints:
File /opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/site-packages/ase/calculators/abc.py:24, in GetPropertiesMixin.get_potential_energy(self, atoms, force_consistent)
22 else:
23 name = 'energy'
---> 24 return self.get_property(name, atoms)
File /opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/site-packages/ase/calculators/calculator.py:538, in BaseCalculator.get_property(self, name, atoms, allow_calculation)
535 if self.use_cache:
536 self.atoms = atoms.copy()
--> 538 self.calculate(atoms, [name], system_changes)
540 if name not in self.results:
541 # For some reason the calculator was not able to do what we want,
542 # and that is OK.
543 raise PropertyNotImplementedError(
544 '{} not present in this ' 'calculation'.format(name)
545 )
File ~/work/fairchem/fairchem/src/fairchem/core/common/relaxation/ase_utils.py:292, in OCPCalculator.calculate(self, atoms, properties, system_changes)
289 else:
290 batch = atoms
--> 292 predictions = self.trainer.predict(batch, per_image=False, disable_tqdm=True)
294 for key in predictions:
295 _pred = predictions[key]
File /opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/site-packages/torch/utils/_contextlib.py:116, in context_decorator.<locals>.decorate_context(*args, **kwargs)
113 @functools.wraps(func)
114 def decorate_context(*args, **kwargs):
115 with ctx_factory():
--> 116 return func(*args, **kwargs)
File ~/work/fairchem/fairchem/src/fairchem/core/trainers/ocp_trainer.py:473, in OCPTrainer.predict(self, data_loader, per_image, results_file, disable_tqdm)
465 for _, batch in tqdm(
466 enumerate(data_loader),
467 total=len(data_loader),
(...)
470 disable=disable_tqdm,
471 ):
472 with torch.cuda.amp.autocast(enabled=self.scaler is not None):
--> 473 out = self._forward(batch)
475 for target_key in self.config["outputs"]:
476 pred = self._denorm_preds(target_key, out[target_key], batch)
File ~/work/fairchem/fairchem/src/fairchem/core/trainers/ocp_trainer.py:254, in OCPTrainer._forward(self, batch)
253 def _forward(self, batch):
--> 254 out = self.model(batch.to(self.device))
256 outputs = {}
257 batch_size = batch.natoms.numel()
File /opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/site-packages/torch/nn/modules/module.py:1553, in Module._wrapped_call_impl(self, *args, **kwargs)
1551 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1552 else:
-> 1553 return self._call_impl(*args, **kwargs)
File /opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/site-packages/torch/nn/modules/module.py:1562, in Module._call_impl(self, *args, **kwargs)
1557 # If we don't have any hooks, we want to skip the rest of the logic in
1558 # this function, and just call forward.
1559 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1560 or _global_backward_pre_hooks or _global_backward_hooks
1561 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1562 return forward_call(*args, **kwargs)
1564 try:
1565 result = None
File ~/work/fairchem/fairchem/src/fairchem/core/common/utils.py:176, in conditional_grad.<locals>.decorator.<locals>.cls_method(self, *args, **kwargs)
174 if self.regress_forces and not getattr(self, "direct_forces", 0):
175 f = dec(func)
--> 176 return f(self, *args, **kwargs)
File ~/work/fairchem/fairchem/src/fairchem/core/models/gemnet_oc/gemnet_oc.py:1218, in GemNetOC.forward(self, data)
1196 (
1197 main_graph,
1198 a2a_graph,
(...)
1205 quad_idx,
1206 ) = self.get_graphs_and_indices(data)
1207 _, idx_t = main_graph["edge_index"]
1209 (
1210 basis_rad_raw,
1211 basis_atom_update,
1212 basis_output,
1213 bases_qint,
1214 bases_e2e,
1215 bases_a2e,
1216 bases_e2a,
1217 basis_a2a_rad,
-> 1218 ) = self.get_bases(
1219 main_graph=main_graph,
1220 a2a_graph=a2a_graph,
1221 a2ee2a_graph=a2ee2a_graph,
1222 qint_graph=qint_graph,
1223 trip_idx_e2e=trip_idx_e2e,
1224 trip_idx_a2e=trip_idx_a2e,
1225 trip_idx_e2a=trip_idx_e2a,
1226 quad_idx=quad_idx,
1227 num_atoms=num_atoms,
1228 )
1230 # Embedding block
1231 h = self.atom_emb(atomic_numbers)
File ~/work/fairchem/fairchem/src/fairchem/core/models/gemnet_oc/gemnet_oc.py:1091, in GemNetOC.get_bases(self, main_graph, a2a_graph, a2ee2a_graph, qint_graph, trip_idx_e2e, trip_idx_a2e, trip_idx_e2a, quad_idx, num_atoms)
1082 cosφ_cab_q, cosφ_abd, angle_cabd = self.calculate_quad_angles(
1083 main_graph["vector"],
1084 qint_graph["vector"],
1085 quad_idx,
1086 )
1088 basis_rad_cir_qint_raw, basis_cir_qint_raw = self.cbf_basis_qint(
1089 qint_graph["distance"], cosφ_abd
1090 )
-> 1091 basis_rad_sph_qint_raw, basis_sph_qint_raw = self.sbf_basis_qint(
1092 main_graph["distance"],
1093 cosφ_cab_q[quad_idx["trip_out_to_quad"]],
1094 angle_cabd,
1095 )
1096 if self.atom_edge_interaction:
1097 basis_rad_a2ee2a_raw = self.radial_basis_aeaint(a2ee2a_graph["distance"])
File /opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/site-packages/torch/nn/modules/module.py:1553, in Module._wrapped_call_impl(self, *args, **kwargs)
1551 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1552 else:
-> 1553 return self._call_impl(*args, **kwargs)
File /opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/site-packages/torch/nn/modules/module.py:1562, in Module._call_impl(self, *args, **kwargs)
1557 # If we don't have any hooks, we want to skip the rest of the logic in
1558 # this function, and just call forward.
1559 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1560 or _global_backward_pre_hooks or _global_backward_hooks
1561 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1562 return forward_call(*args, **kwargs)
1564 try:
1565 result = None
File ~/work/fairchem/fairchem/src/fairchem/core/models/gemnet_oc/layers/spherical_basis.py:132, in SphericalBasisLayer.forward(self, D_ca, cosφ_cab, θ_cabd)
130 def forward(self, D_ca, cosφ_cab, θ_cabd):
131 rad_basis = self.radial_basis(D_ca)
--> 132 sph_basis = self.spherical_basis(cosφ_cab, θ_cabd)
133 # (num_quadruplets, num_spherical**2)
135 if self.scale_basis:
File ~/work/fairchem/fairchem/src/fairchem/core/models/gemnet_oc/layers/spherical_basis.py:116, in SphericalBasisLayer.__init__.<locals>.<lambda>(cosφ, θ)
111 elif sbf_name == "legendre_outer":
112 circular_basis = get_sph_harm_basis(num_spherical, zero_m_only=True)
113 self.spherical_basis = lambda cosφ, ϑ: (
114 circular_basis(cosφ)[:, :, None]
115 * circular_basis(torch.cos(ϑ))[:, None, :]
--> 116 ).reshape(cosφ.shape[0], -1)
118 elif sbf_name == "gaussian_outer":
119 self.circular_basis = GaussianBasis(
120 start=-1, stop=1, num_gaussians=num_spherical, **sbf_hparams
121 )
RuntimeError: cannot reshape tensor of 0 elements into shape [0, -1] because the unspecified dimension size -1 can be any value and is ambiguous
I get wildly different energies from the different models#
Some models are trained on adsorption energies, and some are trained on total energies. You have to know which one you are using.
Sometimes you can tell by the magnitude of energies, but you should use care with this. If energies are “small” and near zero they are likely adsorption energies. If energies are “large” in magnitude they are probably total energies. This can be misleading though, as it depends on the total number of atoms in the systems.
# These are to suppress the output from making the calculators.
from io import StringIO
import contextlib
from ase.build import fcc111, add_adsorbate
slab = fcc111('Pt', size=(2, 2, 5), vacuum=10.0)
add_adsorbate(slab, 'O', height=1.2, position='fcc')
from fairchem.core.models.model_registry import model_name_to_local_file
# OC20 model - trained on adsorption energies
checkpoint_path = model_name_to_local_file('GemNet-OC-S2EF-OC20-All', local_cache='/tmp/fairchem_checkpoints/')
with contextlib.redirect_stdout(StringIO()) as _:
calc = OCPCalculator(checkpoint_path=checkpoint_path, cpu=False)
slab.set_calculator(calc)
slab.get_potential_energy()
INFO:root:Checking local cache: /tmp/fairchem_checkpoints/ for model GemNet-OC-S2EF-OC20-All
/home/runner/work/fairchem/fairchem/src/fairchem/core/common/relaxation/ase_utils.py:191: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
checkpoint = torch.load(checkpoint_path, map_location=torch.device("cpu"))
WARNING:root:Detected old config, converting to new format. Consider updating to avoid potential incompatibilities.
INFO:root:amp: true
cmd:
checkpoint_dir: /home/runner/work/fairchem/fairchem/docs/core/checkpoints/2024-12-19-04-01-04
commit: 83e1a53
identifier: ''
logs_dir: /home/runner/work/fairchem/fairchem/docs/core/logs/wandb/2024-12-19-04-01-04
print_every: 100
results_dir: /home/runner/work/fairchem/fairchem/docs/core/results/2024-12-19-04-01-04
seed: null
timestamp_id: 2024-12-19-04-01-04
version: 1.4.0
dataset:
format: trajectory_lmdb
grad_target_mean: 0.0
grad_target_std: 2.887317180633545
key_mapping:
force: forces
y: energy
normalize_labels: true
target_mean: -0.7554450631141663
target_std: 2.887317180633545
transforms:
normalizer:
energy:
mean: -0.7554450631141663
stdev: 2.887317180633545
forces:
mean: 0.0
stdev: 2.887317180633545
evaluation_metrics:
metrics:
energy:
- mae
forces:
- forcesx_mae
- forcesy_mae
- forcesz_mae
- mae
- cosine_similarity
- magnitude_error
misc:
- energy_forces_within_threshold
primary_metric: forces_mae
gp_gpus: null
gpus: 0
logger: wandb
loss_functions:
- energy:
coefficient: 1
fn: mae
- forces:
coefficient: 100
fn: l2mae
model:
activation: silu
atom_edge_interaction: true
atom_interaction: true
cbf:
name: spherical_harmonics
cutoff: 12.0
cutoff_aeaint: 12.0
cutoff_aint: 12.0
cutoff_qint: 12.0
direct_forces: true
edge_atom_interaction: true
emb_size_aint_in: 64
emb_size_aint_out: 64
emb_size_atom: 256
emb_size_cbf: 16
emb_size_edge: 512
emb_size_quad_in: 32
emb_size_quad_out: 32
emb_size_rbf: 16
emb_size_sbf: 32
emb_size_trip_in: 64
emb_size_trip_out: 64
envelope:
exponent: 5
name: polynomial
extensive: true
forces_coupled: false
max_neighbors: 30
max_neighbors_aeaint: 20
max_neighbors_aint: 1000
max_neighbors_qint: 8
name: gemnet_oc
num_after_skip: 2
num_atom: 3
num_atom_emb_layers: 2
num_before_skip: 2
num_blocks: 4
num_concat: 1
num_global_out_layers: 2
num_output_afteratom: 3
num_radial: 128
num_spherical: 7
otf_graph: true
output_init: HeOrthogonal
qint_tags:
- 1
- 2
quad_interaction: true
rbf:
name: gaussian
regress_forces: true
sbf:
name: legendre_outer
symmetric_edge_symmetrization: false
optim:
batch_size: 16
clip_grad_norm: 10
ema_decay: 0.999
energy_coefficient: 1
eval_batch_size: 16
eval_every: 5000
factor: 0.8
force_coefficient: 100
load_balancing: atoms
loss_energy: mae
loss_force: l2mae
lr_initial: 0.0005
max_epochs: 80
mode: min
num_workers: 2
optimizer: AdamW
optimizer_params:
amsgrad: true
patience: 3
scheduler: ReduceLROnPlateau
weight_decay: 0
outputs:
energy:
level: system
forces:
eval_on_free_atoms: true
level: atom
train_on_free_atoms: true
relax_dataset: {}
slurm:
additional_parameters:
constraint: volta32gb
cpus_per_task: 3
folder: /checkpoint/abhshkdz/ocp_oct1_logs/46876566
gpus_per_node: 8
job_id: '46876566'
job_name: gemnet_q_all_fc100
mem: 480GB
nodes: 4
ntasks_per_node: 8
partition: learnaccel
time: 4320
task:
dataset: trajectory_lmdb
description: Regressing to energies and forces for DFT trajectories from OCP
eval_on_free_atoms: true
grad_input: atomic forces
labels:
- potential energy
metric: mae
primary_metric: forces_mae
train_on_free_atoms: true
type: regression
test_dataset: {}
trainer: ocp
val_dataset: {}
INFO:root:Loading model: gemnet_oc
WARNING:root:Unrecognized arguments: ['symmetric_edge_symmetrization']
INFO:root:Loaded GemNetOC with 38864438 parameters.
INFO:root:Loading checkpoint in inference-only mode, not loading keys associated with trainer state!
INFO:root:Overwriting scaling factors with those loaded from checkpoint. If you're generating predictions with a pretrained checkpoint, this is the correct behavior. To disable this, delete `scale_dict` from the checkpoint.
/home/runner/work/fairchem/fairchem/src/fairchem/core/modules/normalization/normalizer.py:69: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
"mean": torch.tensor(state_dict["mean"]),
WARNING:root:No seed has been set in modelcheckpoint or OCPCalculator! Results may not be reproducible on re-run
/tmp/ipykernel_2669/2356712572.py:11: DeprecationWarning: Please use atoms.calc = calc
slab.set_calculator(calc)
/home/runner/work/fairchem/fairchem/src/fairchem/core/trainers/ocp_trainer.py:472: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
with torch.cuda.amp.autocast(enabled=self.scaler is not None):
/home/runner/work/fairchem/fairchem/src/fairchem/core/models/gemnet_oc/gemnet_oc.py:1270: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
with torch.cuda.amp.autocast(False):
1.2851653099060059
# An OC22 checkpoint - trained on total energy
checkpoint_path = model_name_to_local_file('GemNet-OC-S2EFS-OC20+OC22', local_cache='/tmp/fairchem_checkpoints/')
with contextlib.redirect_stdout(StringIO()) as _:
calc = OCPCalculator(checkpoint_path=checkpoint_path, cpu=False)
slab.set_calculator(calc)
slab.get_potential_energy()
INFO:root:Checking local cache: /tmp/fairchem_checkpoints/ for model GemNet-OC-S2EFS-OC20+OC22
/home/runner/work/fairchem/fairchem/src/fairchem/core/common/relaxation/ase_utils.py:191: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
checkpoint = torch.load(checkpoint_path, map_location=torch.device("cpu"))
WARNING:root:Detected old config, converting to new format. Consider updating to avoid potential incompatibilities.
INFO:root:amp: true
cmd:
checkpoint_dir: /home/runner/work/fairchem/fairchem/docs/core/checkpoints/2024-12-19-04-01-04
commit: 83e1a53
identifier: ''
logs_dir: /home/runner/work/fairchem/fairchem/docs/core/logs/wandb/2024-12-19-04-01-04
print_every: 100
results_dir: /home/runner/work/fairchem/fairchem/docs/core/results/2024-12-19-04-01-04
seed: null
timestamp_id: 2024-12-19-04-01-04
version: 1.4.0
dataset:
format: oc22_lmdb
key_mapping:
force: forces
y: energy
normalize_labels: false
oc20_ref: /checkpoint/janlan/ocp/other_data/final_ref_energies_02_07_2021.pkl
raw_energy_target: true
evaluation_metrics:
metrics:
energy:
- mae
forces:
- forcesx_mae
- forcesy_mae
- forcesz_mae
- mae
- cosine_similarity
- magnitude_error
misc:
- energy_forces_within_threshold
primary_metric: forces_mae
gp_gpus: null
gpus: 0
logger: wandb
loss_functions:
- energy:
coefficient: 1
fn: mae
- forces:
coefficient: 1
fn: l2mae
model:
activation: silu
atom_edge_interaction: true
atom_interaction: true
cbf:
name: spherical_harmonics
cutoff: 12.0
cutoff_aeaint: 12.0
cutoff_aint: 12.0
cutoff_qint: 12.0
direct_forces: true
edge_atom_interaction: true
emb_size_aint_in: 64
emb_size_aint_out: 64
emb_size_atom: 256
emb_size_cbf: 16
emb_size_edge: 512
emb_size_quad_in: 32
emb_size_quad_out: 32
emb_size_rbf: 16
emb_size_sbf: 32
emb_size_trip_in: 64
emb_size_trip_out: 64
envelope:
exponent: 5
name: polynomial
extensive: true
forces_coupled: false
max_neighbors: 30
max_neighbors_aeaint: 20
max_neighbors_aint: 1000
max_neighbors_qint: 8
name: gemnet_oc
num_after_skip: 2
num_atom: 3
num_atom_emb_layers: 2
num_before_skip: 2
num_blocks: 4
num_concat: 1
num_global_out_layers: 2
num_output_afteratom: 3
num_radial: 128
num_spherical: 7
otf_graph: true
output_init: HeOrthogonal
qint_tags:
- 1
- 2
quad_interaction: true
rbf:
name: gaussian
regress_forces: true
sbf:
name: legendre_outer
symmetric_edge_symmetrization: false
optim:
batch_size: 16
clip_grad_norm: 10
ema_decay: 0.999
energy_coefficient: 1
eval_batch_size: 16
eval_every: 5000
factor: 0.8
force_coefficient: 1
load_balancing: atoms
loss_energy: mae
loss_force: atomwisel2
lr_initial: 0.0005
max_epochs: 80
mode: min
num_workers: 2
optimizer: AdamW
optimizer_params:
amsgrad: true
patience: 3
scheduler: ReduceLROnPlateau
weight_decay: 0
outputs:
energy:
level: system
forces:
eval_on_free_atoms: true
level: atom
train_on_free_atoms: true
relax_dataset: {}
slurm:
additional_parameters:
constraint: volta32gb
cpus_per_task: 3
folder: /checkpoint/abhshkdz/ocp_oct1_logs/57632342
gpus_per_node: 8
job_id: '57632342'
job_name: gnoc_oc22_oc20_all_s2ef
mem: 480GB
nodes: 8
ntasks_per_node: 8
partition: ocp,learnaccel
time: 4320
task:
dataset: oc22_lmdb
description: Regressing to energies and forces for DFT trajectories from OCP
eval_on_free_atoms: true
grad_input: atomic forces
labels:
- potential energy
metric: mae
primary_metric: forces_mae
train_on_free_atoms: true
type: regression
test_dataset: {}
trainer: ocp
val_dataset: {}
INFO:root:Loading model: gemnet_oc
WARNING:root:Unrecognized arguments: ['symmetric_edge_symmetrization']
INFO:root:Loaded GemNetOC with 38864438 parameters.
INFO:root:Loading checkpoint in inference-only mode, not loading keys associated with trainer state!
INFO:root:Overwriting scaling factors with those loaded from checkpoint. If you're generating predictions with a pretrained checkpoint, this is the correct behavior. To disable this, delete `scale_dict` from the checkpoint.
WARNING:root:No seed has been set in modelcheckpoint or OCPCalculator! Results may not be reproducible on re-run
/tmp/ipykernel_2669/2054440827.py:9: DeprecationWarning: Please use atoms.calc = calc
slab.set_calculator(calc)
/home/runner/work/fairchem/fairchem/src/fairchem/core/trainers/ocp_trainer.py:472: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
with torch.cuda.amp.autocast(enabled=self.scaler is not None):
/home/runner/work/fairchem/fairchem/src/fairchem/core/models/gemnet_oc/gemnet_oc.py:1270: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
with torch.cuda.amp.autocast(False):
-110.40040588378906
# This eSCN model is trained on adsorption energies
checkpoint_path = model_name_to_local_file('eSCN-L4-M2-Lay12-S2EF-OC20-2M', local_cache='/tmp/fairchem_checkpoints/')
with contextlib.redirect_stdout(StringIO()) as _:
calc = OCPCalculator(checkpoint_path=checkpoint_path, cpu=False)
slab.set_calculator(calc)
slab.get_potential_energy()
INFO:root:Checking local cache: /tmp/fairchem_checkpoints/ for model eSCN-L4-M2-Lay12-S2EF-OC20-2M
/home/runner/work/fairchem/fairchem/src/fairchem/core/common/relaxation/ase_utils.py:191: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
checkpoint = torch.load(checkpoint_path, map_location=torch.device("cpu"))
WARNING:root:Detected old config, converting to new format. Consider updating to avoid potential incompatibilities.
INFO:root:amp: true
cmd:
checkpoint_dir: /home/runner/work/fairchem/fairchem/docs/core/checkpoints/2024-12-19-04-01-04
commit: 83e1a53
identifier: ''
logs_dir: /home/runner/work/fairchem/fairchem/docs/core/logs/wandb/2024-12-19-04-01-04
print_every: 100
results_dir: /home/runner/work/fairchem/fairchem/docs/core/results/2024-12-19-04-01-04
seed: null
timestamp_id: 2024-12-19-04-01-04
version: 1.4.0
dataset:
format: trajectory_lmdb
grad_target_mean: 0.0
grad_target_std: 2.887317180633545
key_mapping:
force: forces
y: energy
normalize_labels: true
target_mean: -0.7554450631141663
target_std: 2.887317180633545
transforms:
normalizer:
energy:
mean: -0.7554450631141663
stdev: 2.887317180633545
forces:
mean: 0.0
stdev: 2.887317180633545
evaluation_metrics:
metrics:
energy:
- mae
forces:
- forcesx_mae
- forcesy_mae
- forcesz_mae
- mae
- cosine_similarity
- magnitude_error
misc:
- energy_forces_within_threshold
primary_metric: forces_mae
gp_gpus: null
gpus: 0
logger: wandb
loss_functions:
- energy:
coefficient: 2
fn: mae
- forces:
coefficient: 100
fn: l2mae
model:
basis_width_scalar: 2.0
cutoff: 12.0
distance_function: gaussian
hidden_channels: 256
lmax_list:
- 4
max_neighbors: 20
mmax_list:
- 2
name: escn
num_layers: 12
num_sphere_samples: 128
otf_graph: true
regress_forces: true
sphere_channels: 128
use_pbc: true
optim:
batch_size: 6
clip_grad_norm: 20
ema_decay: 0.999
energy_coefficient: 2
eval_batch_size: 6
eval_every: 5000
force_coefficient: 100
loss_energy: mae
loss_force: l2mae
lr_gamma: 0.3
lr_initial: 0.0008
lr_milestones:
- 145833
- 187500
- 229166
max_epochs: 12
num_workers: 8
optimizer: AdamW
optimizer_params:
amsgrad: true
warmup_factor: 0.2
warmup_steps: 100
outputs:
energy:
level: system
forces:
eval_on_free_atoms: true
level: atom
train_on_free_atoms: true
relax_dataset: {}
slurm:
cpus_per_task: 9
folder: /checkpoint/zitnick/ocp_logs/3710525
gpus_per_node: 8
job_id: '3710525'
job_name: eSCN-L4-M2-Lay12-2M
mem: 480GB
nodes: 2
ntasks_per_node: 8
partition: learnaccel
time: 4320
task:
dataset: trajectory_lmdb
description: Regressing to energies and forces for DFT trajectories from OCP
eval_on_free_atoms: true
grad_input: atomic forces
labels:
- potential energy
metric: mae
primary_metric: forces_mae
relax_dataset:
src: /checkpoint/electrocatalysis/relaxations/features/init_to_relaxed/test/id/data.lmdb
relax_opt:
alpha: 70.0
damping: 1.0
maxstep: 0.04
memory: 50
name: lbfgs
traj_dir: /checkpoint/zitnick/ocp/mloutputs/scn_relaxations/SCNF72-6-lay12/val_id/
relaxation_steps: 200
train_on_free_atoms: true
type: regression
write_pos: true
test_dataset: {}
trainer: ocp
val_dataset: {}
INFO:root:Loading model: escn
INFO:root:Loaded eSCN with 36112896 parameters.
INFO:root:Loading checkpoint in inference-only mode, not loading keys associated with trainer state!
/home/runner/work/fairchem/fairchem/src/fairchem/core/modules/normalization/normalizer.py:69: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
"mean": torch.tensor(state_dict["mean"]),
WARNING:root:No seed has been set in modelcheckpoint or OCPCalculator! Results may not be reproducible on re-run
/tmp/ipykernel_2669/1817216860.py:7: DeprecationWarning: Please use atoms.calc = calc
slab.set_calculator(calc)
/home/runner/work/fairchem/fairchem/src/fairchem/core/trainers/ocp_trainer.py:472: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
with torch.cuda.amp.autocast(enabled=self.scaler is not None):
1.6843576431274414
Miscellaneous warnings#
In general, warnings are not errors.
Unrecognized arguments#
With Gemnet models you might see warnings like:
WARNING:root:Unrecognized arguments: ['symmetric_edge_symmetrization']
You can ignore this warning, it is not important for predictions.
Unable to identify ocp trainer#
The trainer is not specified in some checkpoints, and defaults to forces
which means energy and forces are calculated. This is the default for the ASE OCP calculator, and this warning just alerts you it is setting that.
WARNING:root:Unable to identify ocp trainer, defaulting to `forces`. Specify the `trainer` argument into OCPCalculator if otherwise.
Request entity too large - can’t save your Notebook#
If you run commands that generate a lot of output in a notebook, sometimes the Jupyter notebook will become too large to save. It is kind of sad, the only thing I know to do is delete the output of the cell. Then maybe you can save it.
A solution after you know it happens is redirect output to a file.
This has happened when running training in a notebook where there are too many lines of output, or if you have a lot (20+) of inline images.
You need at least four atoms for molecules with some models#
Gemnet in particular seems to require at least 4 atoms. This has to do with interactions between atoms and their neighbors.
%%capture
from fairchem.core.common.relaxation.ase_utils import OCPCalculator
from fairchem.core.models.model_registry import model_name_to_local_file
import os
checkpoint_path = model_name_to_local_file('GemNet-OC-S2EFS-OC20+OC22', local_cache='/tmp/fairchem_checkpoints/')
calc = OCPCalculator(checkpoint_path=checkpoint_path)
INFO:root:Checking local cache: /tmp/fairchem_checkpoints/ for model GemNet-OC-S2EFS-OC20+OC22
WARNING:root:Detected old config, converting to new format. Consider updating to avoid potential incompatibilities.
INFO:root:amp: true
cmd:
checkpoint_dir: /home/runner/work/fairchem/fairchem/docs/core/checkpoints/2024-12-19-04-01-04
commit: 83e1a53
identifier: ''
logs_dir: /home/runner/work/fairchem/fairchem/docs/core/logs/wandb/2024-12-19-04-01-04
print_every: 100
results_dir: /home/runner/work/fairchem/fairchem/docs/core/results/2024-12-19-04-01-04
seed: null
timestamp_id: 2024-12-19-04-01-04
version: 1.4.0
dataset:
format: oc22_lmdb
key_mapping:
force: forces
y: energy
normalize_labels: false
oc20_ref: /checkpoint/janlan/ocp/other_data/final_ref_energies_02_07_2021.pkl
raw_energy_target: true
evaluation_metrics:
metrics:
energy:
- mae
forces:
- forcesx_mae
- forcesy_mae
- forcesz_mae
- mae
- cosine_similarity
- magnitude_error
misc:
- energy_forces_within_threshold
primary_metric: forces_mae
gp_gpus: null
gpus: 0
logger: wandb
loss_functions:
- energy:
coefficient: 1
fn: mae
- forces:
coefficient: 1
fn: l2mae
model:
activation: silu
atom_edge_interaction: true
atom_interaction: true
cbf:
name: spherical_harmonics
cutoff: 12.0
cutoff_aeaint: 12.0
cutoff_aint: 12.0
cutoff_qint: 12.0
direct_forces: true
edge_atom_interaction: true
emb_size_aint_in: 64
emb_size_aint_out: 64
emb_size_atom: 256
emb_size_cbf: 16
emb_size_edge: 512
emb_size_quad_in: 32
emb_size_quad_out: 32
emb_size_rbf: 16
emb_size_sbf: 32
emb_size_trip_in: 64
emb_size_trip_out: 64
envelope:
exponent: 5
name: polynomial
extensive: true
forces_coupled: false
max_neighbors: 30
max_neighbors_aeaint: 20
max_neighbors_aint: 1000
max_neighbors_qint: 8
name: gemnet_oc
num_after_skip: 2
num_atom: 3
num_atom_emb_layers: 2
num_before_skip: 2
num_blocks: 4
num_concat: 1
num_global_out_layers: 2
num_output_afteratom: 3
num_radial: 128
num_spherical: 7
otf_graph: true
output_init: HeOrthogonal
qint_tags:
- 1
- 2
quad_interaction: true
rbf:
name: gaussian
regress_forces: true
sbf:
name: legendre_outer
symmetric_edge_symmetrization: false
optim:
batch_size: 16
clip_grad_norm: 10
ema_decay: 0.999
energy_coefficient: 1
eval_batch_size: 16
eval_every: 5000
factor: 0.8
force_coefficient: 1
load_balancing: atoms
loss_energy: mae
loss_force: atomwisel2
lr_initial: 0.0005
max_epochs: 80
mode: min
num_workers: 2
optimizer: AdamW
optimizer_params:
amsgrad: true
patience: 3
scheduler: ReduceLROnPlateau
weight_decay: 0
outputs:
energy:
level: system
forces:
eval_on_free_atoms: true
level: atom
train_on_free_atoms: true
relax_dataset: {}
slurm:
additional_parameters:
constraint: volta32gb
cpus_per_task: 3
folder: /checkpoint/abhshkdz/ocp_oct1_logs/57632342
gpus_per_node: 8
job_id: '57632342'
job_name: gnoc_oc22_oc20_all_s2ef
mem: 480GB
nodes: 8
ntasks_per_node: 8
partition: ocp,learnaccel
time: 4320
task:
dataset: oc22_lmdb
description: Regressing to energies and forces for DFT trajectories from OCP
eval_on_free_atoms: true
grad_input: atomic forces
labels:
- potential energy
metric: mae
primary_metric: forces_mae
train_on_free_atoms: true
type: regression
test_dataset: {}
trainer: ocp
val_dataset: {}
INFO:root:Loading model: gemnet_oc
WARNING:root:Unrecognized arguments: ['symmetric_edge_symmetrization']
INFO:root:Loaded GemNetOC with 38864438 parameters.
INFO:root:Loading checkpoint in inference-only mode, not loading keys associated with trainer state!
INFO:root:Overwriting scaling factors with those loaded from checkpoint. If you're generating predictions with a pretrained checkpoint, this is the correct behavior. To disable this, delete `scale_dict` from the checkpoint.
WARNING:root:No seed has been set in modelcheckpoint or OCPCalculator! Results may not be reproducible on re-run
%%capture
from ase.build import molecule
import numpy as np
atoms = molecule('H2O')
atoms.set_tags(np.ones(len(atoms)))
atoms.set_calculator(calc)
atoms.get_potential_energy()
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[9], line 7
5 atoms.set_tags(np.ones(len(atoms)))
6 atoms.set_calculator(calc)
----> 7 atoms.get_potential_energy()
File /opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/site-packages/ase/atoms.py:755, in Atoms.get_potential_energy(self, force_consistent, apply_constraint)
752 energy = self._calc.get_potential_energy(
753 self, force_consistent=force_consistent)
754 else:
--> 755 energy = self._calc.get_potential_energy(self)
756 if apply_constraint:
757 for constraint in self.constraints:
File /opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/site-packages/ase/calculators/abc.py:24, in GetPropertiesMixin.get_potential_energy(self, atoms, force_consistent)
22 else:
23 name = 'energy'
---> 24 return self.get_property(name, atoms)
File /opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/site-packages/ase/calculators/calculator.py:538, in BaseCalculator.get_property(self, name, atoms, allow_calculation)
535 if self.use_cache:
536 self.atoms = atoms.copy()
--> 538 self.calculate(atoms, [name], system_changes)
540 if name not in self.results:
541 # For some reason the calculator was not able to do what we want,
542 # and that is OK.
543 raise PropertyNotImplementedError(
544 '{} not present in this ' 'calculation'.format(name)
545 )
File ~/work/fairchem/fairchem/src/fairchem/core/common/relaxation/ase_utils.py:292, in OCPCalculator.calculate(self, atoms, properties, system_changes)
289 else:
290 batch = atoms
--> 292 predictions = self.trainer.predict(batch, per_image=False, disable_tqdm=True)
294 for key in predictions:
295 _pred = predictions[key]
File /opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/site-packages/torch/utils/_contextlib.py:116, in context_decorator.<locals>.decorate_context(*args, **kwargs)
113 @functools.wraps(func)
114 def decorate_context(*args, **kwargs):
115 with ctx_factory():
--> 116 return func(*args, **kwargs)
File ~/work/fairchem/fairchem/src/fairchem/core/trainers/ocp_trainer.py:473, in OCPTrainer.predict(self, data_loader, per_image, results_file, disable_tqdm)
465 for _, batch in tqdm(
466 enumerate(data_loader),
467 total=len(data_loader),
(...)
470 disable=disable_tqdm,
471 ):
472 with torch.cuda.amp.autocast(enabled=self.scaler is not None):
--> 473 out = self._forward(batch)
475 for target_key in self.config["outputs"]:
476 pred = self._denorm_preds(target_key, out[target_key], batch)
File ~/work/fairchem/fairchem/src/fairchem/core/trainers/ocp_trainer.py:254, in OCPTrainer._forward(self, batch)
253 def _forward(self, batch):
--> 254 out = self.model(batch.to(self.device))
256 outputs = {}
257 batch_size = batch.natoms.numel()
File /opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/site-packages/torch/nn/modules/module.py:1553, in Module._wrapped_call_impl(self, *args, **kwargs)
1551 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1552 else:
-> 1553 return self._call_impl(*args, **kwargs)
File /opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/site-packages/torch/nn/modules/module.py:1562, in Module._call_impl(self, *args, **kwargs)
1557 # If we don't have any hooks, we want to skip the rest of the logic in
1558 # this function, and just call forward.
1559 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1560 or _global_backward_pre_hooks or _global_backward_hooks
1561 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1562 return forward_call(*args, **kwargs)
1564 try:
1565 result = None
File ~/work/fairchem/fairchem/src/fairchem/core/common/utils.py:176, in conditional_grad.<locals>.decorator.<locals>.cls_method(self, *args, **kwargs)
174 if self.regress_forces and not getattr(self, "direct_forces", 0):
175 f = dec(func)
--> 176 return f(self, *args, **kwargs)
File ~/work/fairchem/fairchem/src/fairchem/core/models/gemnet_oc/gemnet_oc.py:1218, in GemNetOC.forward(self, data)
1196 (
1197 main_graph,
1198 a2a_graph,
(...)
1205 quad_idx,
1206 ) = self.get_graphs_and_indices(data)
1207 _, idx_t = main_graph["edge_index"]
1209 (
1210 basis_rad_raw,
1211 basis_atom_update,
1212 basis_output,
1213 bases_qint,
1214 bases_e2e,
1215 bases_a2e,
1216 bases_e2a,
1217 basis_a2a_rad,
-> 1218 ) = self.get_bases(
1219 main_graph=main_graph,
1220 a2a_graph=a2a_graph,
1221 a2ee2a_graph=a2ee2a_graph,
1222 qint_graph=qint_graph,
1223 trip_idx_e2e=trip_idx_e2e,
1224 trip_idx_a2e=trip_idx_a2e,
1225 trip_idx_e2a=trip_idx_e2a,
1226 quad_idx=quad_idx,
1227 num_atoms=num_atoms,
1228 )
1230 # Embedding block
1231 h = self.atom_emb(atomic_numbers)
File ~/work/fairchem/fairchem/src/fairchem/core/models/gemnet_oc/gemnet_oc.py:1091, in GemNetOC.get_bases(self, main_graph, a2a_graph, a2ee2a_graph, qint_graph, trip_idx_e2e, trip_idx_a2e, trip_idx_e2a, quad_idx, num_atoms)
1082 cosφ_cab_q, cosφ_abd, angle_cabd = self.calculate_quad_angles(
1083 main_graph["vector"],
1084 qint_graph["vector"],
1085 quad_idx,
1086 )
1088 basis_rad_cir_qint_raw, basis_cir_qint_raw = self.cbf_basis_qint(
1089 qint_graph["distance"], cosφ_abd
1090 )
-> 1091 basis_rad_sph_qint_raw, basis_sph_qint_raw = self.sbf_basis_qint(
1092 main_graph["distance"],
1093 cosφ_cab_q[quad_idx["trip_out_to_quad"]],
1094 angle_cabd,
1095 )
1096 if self.atom_edge_interaction:
1097 basis_rad_a2ee2a_raw = self.radial_basis_aeaint(a2ee2a_graph["distance"])
File /opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/site-packages/torch/nn/modules/module.py:1553, in Module._wrapped_call_impl(self, *args, **kwargs)
1551 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1552 else:
-> 1553 return self._call_impl(*args, **kwargs)
File /opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/site-packages/torch/nn/modules/module.py:1562, in Module._call_impl(self, *args, **kwargs)
1557 # If we don't have any hooks, we want to skip the rest of the logic in
1558 # this function, and just call forward.
1559 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1560 or _global_backward_pre_hooks or _global_backward_hooks
1561 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1562 return forward_call(*args, **kwargs)
1564 try:
1565 result = None
File ~/work/fairchem/fairchem/src/fairchem/core/models/gemnet_oc/layers/spherical_basis.py:132, in SphericalBasisLayer.forward(self, D_ca, cosφ_cab, θ_cabd)
130 def forward(self, D_ca, cosφ_cab, θ_cabd):
131 rad_basis = self.radial_basis(D_ca)
--> 132 sph_basis = self.spherical_basis(cosφ_cab, θ_cabd)
133 # (num_quadruplets, num_spherical**2)
135 if self.scale_basis:
File ~/work/fairchem/fairchem/src/fairchem/core/models/gemnet_oc/layers/spherical_basis.py:116, in SphericalBasisLayer.__init__.<locals>.<lambda>(cosφ, θ)
111 elif sbf_name == "legendre_outer":
112 circular_basis = get_sph_harm_basis(num_spherical, zero_m_only=True)
113 self.spherical_basis = lambda cosφ, ϑ: (
114 circular_basis(cosφ)[:, :, None]
115 * circular_basis(torch.cos(ϑ))[:, None, :]
--> 116 ).reshape(cosφ.shape[0], -1)
118 elif sbf_name == "gaussian_outer":
119 self.circular_basis = GaussianBasis(
120 start=-1, stop=1, num_gaussians=num_spherical, **sbf_hparams
121 )
RuntimeError: cannot reshape tensor of 0 elements into shape [0, -1] because the unspecified dimension size -1 can be any value and is ambiguous
To tag or not?#
Some models use tags to determine which atoms to calculate energies for. For example, Gemnet uses a tag=1 to indicate the atom should be calculated. You will get an error with this model
%%capture
from fairchem.core.common.relaxation.ase_utils import OCPCalculator
from fairchem.core.models.model_registry import model_name_to_local_file
import os
checkpoint_path = model_name_to_local_file('GemNet-OC-S2EFS-OC20+OC22', local_cache='/tmp/fairchem_checkpoints/')
calc = OCPCalculator(checkpoint_path=checkpoint_path)
INFO:root:Checking local cache: /tmp/fairchem_checkpoints/ for model GemNet-OC-S2EFS-OC20+OC22
WARNING:root:Detected old config, converting to new format. Consider updating to avoid potential incompatibilities.
INFO:root:amp: true
cmd:
checkpoint_dir: /home/runner/work/fairchem/fairchem/docs/core/checkpoints/2024-12-19-04-01-04
commit: 83e1a53
identifier: ''
logs_dir: /home/runner/work/fairchem/fairchem/docs/core/logs/wandb/2024-12-19-04-01-04
print_every: 100
results_dir: /home/runner/work/fairchem/fairchem/docs/core/results/2024-12-19-04-01-04
seed: null
timestamp_id: 2024-12-19-04-01-04
version: 1.4.0
dataset:
format: oc22_lmdb
key_mapping:
force: forces
y: energy
normalize_labels: false
oc20_ref: /checkpoint/janlan/ocp/other_data/final_ref_energies_02_07_2021.pkl
raw_energy_target: true
evaluation_metrics:
metrics:
energy:
- mae
forces:
- forcesx_mae
- forcesy_mae
- forcesz_mae
- mae
- cosine_similarity
- magnitude_error
misc:
- energy_forces_within_threshold
primary_metric: forces_mae
gp_gpus: null
gpus: 0
logger: wandb
loss_functions:
- energy:
coefficient: 1
fn: mae
- forces:
coefficient: 1
fn: l2mae
model:
activation: silu
atom_edge_interaction: true
atom_interaction: true
cbf:
name: spherical_harmonics
cutoff: 12.0
cutoff_aeaint: 12.0
cutoff_aint: 12.0
cutoff_qint: 12.0
direct_forces: true
edge_atom_interaction: true
emb_size_aint_in: 64
emb_size_aint_out: 64
emb_size_atom: 256
emb_size_cbf: 16
emb_size_edge: 512
emb_size_quad_in: 32
emb_size_quad_out: 32
emb_size_rbf: 16
emb_size_sbf: 32
emb_size_trip_in: 64
emb_size_trip_out: 64
envelope:
exponent: 5
name: polynomial
extensive: true
forces_coupled: false
max_neighbors: 30
max_neighbors_aeaint: 20
max_neighbors_aint: 1000
max_neighbors_qint: 8
name: gemnet_oc
num_after_skip: 2
num_atom: 3
num_atom_emb_layers: 2
num_before_skip: 2
num_blocks: 4
num_concat: 1
num_global_out_layers: 2
num_output_afteratom: 3
num_radial: 128
num_spherical: 7
otf_graph: true
output_init: HeOrthogonal
qint_tags:
- 1
- 2
quad_interaction: true
rbf:
name: gaussian
regress_forces: true
sbf:
name: legendre_outer
symmetric_edge_symmetrization: false
optim:
batch_size: 16
clip_grad_norm: 10
ema_decay: 0.999
energy_coefficient: 1
eval_batch_size: 16
eval_every: 5000
factor: 0.8
force_coefficient: 1
load_balancing: atoms
loss_energy: mae
loss_force: atomwisel2
lr_initial: 0.0005
max_epochs: 80
mode: min
num_workers: 2
optimizer: AdamW
optimizer_params:
amsgrad: true
patience: 3
scheduler: ReduceLROnPlateau
weight_decay: 0
outputs:
energy:
level: system
forces:
eval_on_free_atoms: true
level: atom
train_on_free_atoms: true
relax_dataset: {}
slurm:
additional_parameters:
constraint: volta32gb
cpus_per_task: 3
folder: /checkpoint/abhshkdz/ocp_oct1_logs/57632342
gpus_per_node: 8
job_id: '57632342'
job_name: gnoc_oc22_oc20_all_s2ef
mem: 480GB
nodes: 8
ntasks_per_node: 8
partition: ocp,learnaccel
time: 4320
task:
dataset: oc22_lmdb
description: Regressing to energies and forces for DFT trajectories from OCP
eval_on_free_atoms: true
grad_input: atomic forces
labels:
- potential energy
metric: mae
primary_metric: forces_mae
train_on_free_atoms: true
type: regression
test_dataset: {}
trainer: ocp
val_dataset: {}
INFO:root:Loading model: gemnet_oc
WARNING:root:Unrecognized arguments: ['symmetric_edge_symmetrization']
INFO:root:Loaded GemNetOC with 38864438 parameters.
INFO:root:Loading checkpoint in inference-only mode, not loading keys associated with trainer state!
INFO:root:Overwriting scaling factors with those loaded from checkpoint. If you're generating predictions with a pretrained checkpoint, this is the correct behavior. To disable this, delete `scale_dict` from the checkpoint.
WARNING:root:No seed has been set in modelcheckpoint or OCPCalculator! Results may not be reproducible on re-run
%%capture
atoms = molecule('CH4')
atoms.set_calculator(calc)
atoms.get_potential_energy() # error
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[11], line 3
1 atoms = molecule('CH4')
2 atoms.set_calculator(calc)
----> 3 atoms.get_potential_energy() # error
File /opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/site-packages/ase/atoms.py:755, in Atoms.get_potential_energy(self, force_consistent, apply_constraint)
752 energy = self._calc.get_potential_energy(
753 self, force_consistent=force_consistent)
754 else:
--> 755 energy = self._calc.get_potential_energy(self)
756 if apply_constraint:
757 for constraint in self.constraints:
File /opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/site-packages/ase/calculators/abc.py:24, in GetPropertiesMixin.get_potential_energy(self, atoms, force_consistent)
22 else:
23 name = 'energy'
---> 24 return self.get_property(name, atoms)
File /opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/site-packages/ase/calculators/calculator.py:538, in BaseCalculator.get_property(self, name, atoms, allow_calculation)
535 if self.use_cache:
536 self.atoms = atoms.copy()
--> 538 self.calculate(atoms, [name], system_changes)
540 if name not in self.results:
541 # For some reason the calculator was not able to do what we want,
542 # and that is OK.
543 raise PropertyNotImplementedError(
544 '{} not present in this ' 'calculation'.format(name)
545 )
File ~/work/fairchem/fairchem/src/fairchem/core/common/relaxation/ase_utils.py:292, in OCPCalculator.calculate(self, atoms, properties, system_changes)
289 else:
290 batch = atoms
--> 292 predictions = self.trainer.predict(batch, per_image=False, disable_tqdm=True)
294 for key in predictions:
295 _pred = predictions[key]
File /opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/site-packages/torch/utils/_contextlib.py:116, in context_decorator.<locals>.decorate_context(*args, **kwargs)
113 @functools.wraps(func)
114 def decorate_context(*args, **kwargs):
115 with ctx_factory():
--> 116 return func(*args, **kwargs)
File ~/work/fairchem/fairchem/src/fairchem/core/trainers/ocp_trainer.py:473, in OCPTrainer.predict(self, data_loader, per_image, results_file, disable_tqdm)
465 for _, batch in tqdm(
466 enumerate(data_loader),
467 total=len(data_loader),
(...)
470 disable=disable_tqdm,
471 ):
472 with torch.cuda.amp.autocast(enabled=self.scaler is not None):
--> 473 out = self._forward(batch)
475 for target_key in self.config["outputs"]:
476 pred = self._denorm_preds(target_key, out[target_key], batch)
File ~/work/fairchem/fairchem/src/fairchem/core/trainers/ocp_trainer.py:254, in OCPTrainer._forward(self, batch)
253 def _forward(self, batch):
--> 254 out = self.model(batch.to(self.device))
256 outputs = {}
257 batch_size = batch.natoms.numel()
File /opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/site-packages/torch/nn/modules/module.py:1553, in Module._wrapped_call_impl(self, *args, **kwargs)
1551 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1552 else:
-> 1553 return self._call_impl(*args, **kwargs)
File /opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/site-packages/torch/nn/modules/module.py:1562, in Module._call_impl(self, *args, **kwargs)
1557 # If we don't have any hooks, we want to skip the rest of the logic in
1558 # this function, and just call forward.
1559 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1560 or _global_backward_pre_hooks or _global_backward_hooks
1561 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1562 return forward_call(*args, **kwargs)
1564 try:
1565 result = None
File ~/work/fairchem/fairchem/src/fairchem/core/common/utils.py:176, in conditional_grad.<locals>.decorator.<locals>.cls_method(self, *args, **kwargs)
174 if self.regress_forces and not getattr(self, "direct_forces", 0):
175 f = dec(func)
--> 176 return f(self, *args, **kwargs)
File ~/work/fairchem/fairchem/src/fairchem/core/models/gemnet_oc/gemnet_oc.py:1218, in GemNetOC.forward(self, data)
1196 (
1197 main_graph,
1198 a2a_graph,
(...)
1205 quad_idx,
1206 ) = self.get_graphs_and_indices(data)
1207 _, idx_t = main_graph["edge_index"]
1209 (
1210 basis_rad_raw,
1211 basis_atom_update,
1212 basis_output,
1213 bases_qint,
1214 bases_e2e,
1215 bases_a2e,
1216 bases_e2a,
1217 basis_a2a_rad,
-> 1218 ) = self.get_bases(
1219 main_graph=main_graph,
1220 a2a_graph=a2a_graph,
1221 a2ee2a_graph=a2ee2a_graph,
1222 qint_graph=qint_graph,
1223 trip_idx_e2e=trip_idx_e2e,
1224 trip_idx_a2e=trip_idx_a2e,
1225 trip_idx_e2a=trip_idx_e2a,
1226 quad_idx=quad_idx,
1227 num_atoms=num_atoms,
1228 )
1230 # Embedding block
1231 h = self.atom_emb(atomic_numbers)
File ~/work/fairchem/fairchem/src/fairchem/core/models/gemnet_oc/gemnet_oc.py:1091, in GemNetOC.get_bases(self, main_graph, a2a_graph, a2ee2a_graph, qint_graph, trip_idx_e2e, trip_idx_a2e, trip_idx_e2a, quad_idx, num_atoms)
1082 cosφ_cab_q, cosφ_abd, angle_cabd = self.calculate_quad_angles(
1083 main_graph["vector"],
1084 qint_graph["vector"],
1085 quad_idx,
1086 )
1088 basis_rad_cir_qint_raw, basis_cir_qint_raw = self.cbf_basis_qint(
1089 qint_graph["distance"], cosφ_abd
1090 )
-> 1091 basis_rad_sph_qint_raw, basis_sph_qint_raw = self.sbf_basis_qint(
1092 main_graph["distance"],
1093 cosφ_cab_q[quad_idx["trip_out_to_quad"]],
1094 angle_cabd,
1095 )
1096 if self.atom_edge_interaction:
1097 basis_rad_a2ee2a_raw = self.radial_basis_aeaint(a2ee2a_graph["distance"])
File /opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/site-packages/torch/nn/modules/module.py:1553, in Module._wrapped_call_impl(self, *args, **kwargs)
1551 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1552 else:
-> 1553 return self._call_impl(*args, **kwargs)
File /opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/site-packages/torch/nn/modules/module.py:1562, in Module._call_impl(self, *args, **kwargs)
1557 # If we don't have any hooks, we want to skip the rest of the logic in
1558 # this function, and just call forward.
1559 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1560 or _global_backward_pre_hooks or _global_backward_hooks
1561 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1562 return forward_call(*args, **kwargs)
1564 try:
1565 result = None
File ~/work/fairchem/fairchem/src/fairchem/core/models/gemnet_oc/layers/spherical_basis.py:132, in SphericalBasisLayer.forward(self, D_ca, cosφ_cab, θ_cabd)
130 def forward(self, D_ca, cosφ_cab, θ_cabd):
131 rad_basis = self.radial_basis(D_ca)
--> 132 sph_basis = self.spherical_basis(cosφ_cab, θ_cabd)
133 # (num_quadruplets, num_spherical**2)
135 if self.scale_basis:
File ~/work/fairchem/fairchem/src/fairchem/core/models/gemnet_oc/layers/spherical_basis.py:116, in SphericalBasisLayer.__init__.<locals>.<lambda>(cosφ, θ)
111 elif sbf_name == "legendre_outer":
112 circular_basis = get_sph_harm_basis(num_spherical, zero_m_only=True)
113 self.spherical_basis = lambda cosφ, ϑ: (
114 circular_basis(cosφ)[:, :, None]
115 * circular_basis(torch.cos(ϑ))[:, None, :]
--> 116 ).reshape(cosφ.shape[0], -1)
118 elif sbf_name == "gaussian_outer":
119 self.circular_basis = GaussianBasis(
120 start=-1, stop=1, num_gaussians=num_spherical, **sbf_hparams
121 )
RuntimeError: cannot reshape tensor of 0 elements into shape [0, -1] because the unspecified dimension size -1 can be any value and is ambiguous
atoms = molecule('CH4')
atoms.set_tags(np.ones(len(atoms))) # <- critical line for Gemnet
atoms.set_calculator(calc)
atoms.get_potential_energy()
/tmp/ipykernel_2669/3906293788.py:3: DeprecationWarning: Please use atoms.calc = calc
atoms.set_calculator(calc)
/home/runner/work/fairchem/fairchem/src/fairchem/core/models/gemnet_oc/gemnet_oc.py:1270: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
with torch.cuda.amp.autocast(False):
-23.71796226501465
Not all models require tags though. This EquiformerV2 model does not use them. This is another detail that is important to keep in mind.
from fairchem.core.common.relaxation.ase_utils import OCPCalculator
from fairchem.core.models.model_registry import model_name_to_local_file
import os
checkpoint_path = model_name_to_local_file('EquiformerV2-31M-S2EF-OC20-All+MD', local_cache='/tmp/fairchem_checkpoints/')
calc = OCPCalculator(checkpoint_path=checkpoint_path)
INFO:root:Checking local cache: /tmp/fairchem_checkpoints/ for model EquiformerV2-31M-S2EF-OC20-All+MD
/home/runner/work/fairchem/fairchem/src/fairchem/core/common/relaxation/ase_utils.py:191: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
checkpoint = torch.load(checkpoint_path, map_location=torch.device("cpu"))
WARNING:root:Detected old config, converting to new format. Consider updating to avoid potential incompatibilities.
INFO:root:amp: true
cmd:
checkpoint_dir: /home/runner/work/fairchem/fairchem/docs/core/checkpoints/2024-12-19-04-01-04
commit: 83e1a53
identifier: ''
logs_dir: /home/runner/work/fairchem/fairchem/docs/core/logs/wandb/2024-12-19-04-01-04
print_every: 100
results_dir: /home/runner/work/fairchem/fairchem/docs/core/results/2024-12-19-04-01-04
seed: null
timestamp_id: 2024-12-19-04-01-04
version: 1.4.0
dataset:
format: trajectory_lmdb_v2
grad_target_mean: 0.0
grad_target_std: 2.887317180633545
key_mapping:
force: forces
y: energy
normalize_labels: true
target_mean: -0.7554450631141663
target_std: 2.887317180633545
transforms:
normalizer:
energy:
mean: -0.7554450631141663
stdev: 2.887317180633545
forces:
mean: 0.0
stdev: 2.887317180633545
evaluation_metrics:
metrics:
energy:
- mae
forces:
- forcesx_mae
- forcesy_mae
- forcesz_mae
- mae
- cosine_similarity
- magnitude_error
misc:
- energy_forces_within_threshold
primary_metric: forces_mae
gp_gpus: null
gpus: 0
logger: wandb
loss_functions:
- energy:
coefficient: 4
fn: mae
- forces:
coefficient: 100
fn: l2mae
model:
alpha_drop: 0.1
attn_activation: silu
attn_alpha_channels: 64
attn_hidden_channels: 64
attn_value_channels: 16
distance_function: gaussian
drop_path_rate: 0.1
edge_channels: 128
ffn_activation: silu
ffn_hidden_channels: 128
grid_resolution: 18
lmax_list:
- 4
max_neighbors: 20
max_num_elements: 90
max_radius: 12.0
mmax_list:
- 2
name: equiformer_v2
norm_type: layer_norm_sh
num_distance_basis: 512
num_heads: 8
num_layers: 8
num_sphere_samples: 128
otf_graph: true
proj_drop: 0.0
regress_forces: true
sphere_channels: 128
use_atom_edge_embedding: true
use_gate_act: false
use_grid_mlp: true
use_pbc: true
use_s2_act_attn: false
weight_init: uniform
optim:
batch_size: 8
clip_grad_norm: 100
ema_decay: 0.999
energy_coefficient: 4
eval_batch_size: 8
eval_every: 10000
force_coefficient: 100
grad_accumulation_steps: 1
load_balancing: atoms
loss_energy: mae
loss_force: l2mae
lr_initial: 0.0004
max_epochs: 3
num_workers: 8
optimizer: AdamW
optimizer_params:
weight_decay: 0.001
scheduler: LambdaLR
scheduler_params:
epochs: 1009275
lambda_type: cosine
lr: 0.0004
lr_min_factor: 0.01
warmup_epochs: 3364.25
warmup_factor: 0.2
outputs:
energy:
level: system
forces:
eval_on_free_atoms: true
level: atom
train_on_free_atoms: true
relax_dataset: {}
slurm:
additional_parameters:
constraint: volta32gb
cpus_per_task: 9
folder: /checkpoint/abhshkdz/open-catalyst-project/logs/equiformer_v2/8307793
gpus_per_node: 8
job_id: '8307793'
job_name: eq2s_051701_allmd
mem: 480GB
nodes: 8
ntasks_per_node: 8
partition: learnaccel
time: 4320
task:
dataset: trajectory_lmdb_v2
eval_on_free_atoms: true
grad_input: atomic forces
labels:
- potential energy
primary_metric: forces_mae
train_on_free_atoms: true
test_dataset: {}
trainer: ocp
val_dataset: {}
INFO:root:Loading model: equiformer_v2
WARNING:root:equiformer_v2 (EquiformerV2) class is deprecated in favor of equiformer_v2_backbone_and_heads (EquiformerV2BackboneAndHeads)
INFO:root:Loaded EquiformerV2 with 31058690 parameters.
INFO:root:Loading checkpoint in inference-only mode, not loading keys associated with trainer state!
/home/runner/work/fairchem/fairchem/src/fairchem/core/modules/normalization/normalizer.py:69: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
"mean": torch.tensor(state_dict["mean"]),
WARNING:root:No seed has been set in modelcheckpoint or OCPCalculator! Results may not be reproducible on re-run
atoms = molecule('CH4')
atoms.set_calculator(calc)
atoms.get_potential_energy()
/tmp/ipykernel_2669/4094489779.py:3: DeprecationWarning: Please use atoms.calc = calc
atoms.set_calculator(calc)
/home/runner/work/fairchem/fairchem/src/fairchem/core/trainers/ocp_trainer.py:472: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
with torch.cuda.amp.autocast(enabled=self.scaler is not None):
-0.42973706126213074
Stochastic simulation results#
Some models are not deterministic (SCN/eSCN/EqV2), i.e. you can get slightly different answers each time you run it. An example is shown below. See Issue 563 for more discussion. This happens because a random selection of is made to sample edges, and a different selection is made each time you run it.
from fairchem.core.models.model_registry import model_name_to_local_file
from fairchem.core.common.relaxation.ase_utils import OCPCalculator
checkpoint_path = model_name_to_local_file('EquiformerV2-31M-S2EF-OC20-All+MD', local_cache='/tmp/fairchem_checkpoints/')
calc = OCPCalculator(checkpoint_path=checkpoint_path, cpu=True)
from ase.build import fcc111, add_adsorbate
from ase.optimize import BFGS
slab = fcc111('Pt', size=(2, 2, 5), vacuum=10.0)
add_adsorbate(slab, 'O', height=1.2, position='fcc')
slab.set_calculator(calc)
results = []
for i in range(10):
calc.calculate(slab, ['energy'], None)
results += [slab.get_potential_energy()]
import numpy as np
print(np.mean(results), np.std(results))
for result in results:
print(result)
INFO:root:Checking local cache: /tmp/fairchem_checkpoints/ for model EquiformerV2-31M-S2EF-OC20-All+MD
/home/runner/work/fairchem/fairchem/src/fairchem/core/common/relaxation/ase_utils.py:191: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
checkpoint = torch.load(checkpoint_path, map_location=torch.device("cpu"))
WARNING:root:Detected old config, converting to new format. Consider updating to avoid potential incompatibilities.
INFO:root:amp: true
cmd:
checkpoint_dir: /home/runner/work/fairchem/fairchem/docs/core/checkpoints/2024-12-19-04-01-04
commit: 83e1a53
identifier: ''
logs_dir: /home/runner/work/fairchem/fairchem/docs/core/logs/wandb/2024-12-19-04-01-04
print_every: 100
results_dir: /home/runner/work/fairchem/fairchem/docs/core/results/2024-12-19-04-01-04
seed: null
timestamp_id: 2024-12-19-04-01-04
version: 1.4.0
dataset:
format: trajectory_lmdb_v2
grad_target_mean: 0.0
grad_target_std: 2.887317180633545
key_mapping:
force: forces
y: energy
normalize_labels: true
target_mean: -0.7554450631141663
target_std: 2.887317180633545
transforms:
normalizer:
energy:
mean: -0.7554450631141663
stdev: 2.887317180633545
forces:
mean: 0.0
stdev: 2.887317180633545
evaluation_metrics:
metrics:
energy:
- mae
forces:
- forcesx_mae
- forcesy_mae
- forcesz_mae
- mae
- cosine_similarity
- magnitude_error
misc:
- energy_forces_within_threshold
primary_metric: forces_mae
gp_gpus: null
gpus: 0
logger: wandb
loss_functions:
- energy:
coefficient: 4
fn: mae
- forces:
coefficient: 100
fn: l2mae
model:
alpha_drop: 0.1
attn_activation: silu
attn_alpha_channels: 64
attn_hidden_channels: 64
attn_value_channels: 16
distance_function: gaussian
drop_path_rate: 0.1
edge_channels: 128
ffn_activation: silu
ffn_hidden_channels: 128
grid_resolution: 18
lmax_list:
- 4
max_neighbors: 20
max_num_elements: 90
max_radius: 12.0
mmax_list:
- 2
name: equiformer_v2
norm_type: layer_norm_sh
num_distance_basis: 512
num_heads: 8
num_layers: 8
num_sphere_samples: 128
otf_graph: true
proj_drop: 0.0
regress_forces: true
sphere_channels: 128
use_atom_edge_embedding: true
use_gate_act: false
use_grid_mlp: true
use_pbc: true
use_s2_act_attn: false
weight_init: uniform
optim:
batch_size: 8
clip_grad_norm: 100
ema_decay: 0.999
energy_coefficient: 4
eval_batch_size: 8
eval_every: 10000
force_coefficient: 100
grad_accumulation_steps: 1
load_balancing: atoms
loss_energy: mae
loss_force: l2mae
lr_initial: 0.0004
max_epochs: 3
num_workers: 8
optimizer: AdamW
optimizer_params:
weight_decay: 0.001
scheduler: LambdaLR
scheduler_params:
epochs: 1009275
lambda_type: cosine
lr: 0.0004
lr_min_factor: 0.01
warmup_epochs: 3364.25
warmup_factor: 0.2
outputs:
energy:
level: system
forces:
eval_on_free_atoms: true
level: atom
train_on_free_atoms: true
relax_dataset: {}
slurm:
additional_parameters:
constraint: volta32gb
cpus_per_task: 9
folder: /checkpoint/abhshkdz/open-catalyst-project/logs/equiformer_v2/8307793
gpus_per_node: 8
job_id: '8307793'
job_name: eq2s_051701_allmd
mem: 480GB
nodes: 8
ntasks_per_node: 8
partition: learnaccel
time: 4320
task:
dataset: trajectory_lmdb_v2
eval_on_free_atoms: true
grad_input: atomic forces
labels:
- potential energy
primary_metric: forces_mae
train_on_free_atoms: true
test_dataset: {}
trainer: ocp
val_dataset: {}
INFO:root:Loading model: equiformer_v2
WARNING:root:equiformer_v2 (EquiformerV2) class is deprecated in favor of equiformer_v2_backbone_and_heads (EquiformerV2BackboneAndHeads)
INFO:root:Loaded EquiformerV2 with 31058690 parameters.
INFO:root:Loading checkpoint in inference-only mode, not loading keys associated with trainer state!
/home/runner/work/fairchem/fairchem/src/fairchem/core/modules/normalization/normalizer.py:69: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
"mean": torch.tensor(state_dict["mean"]),
WARNING:root:No seed has been set in modelcheckpoint or OCPCalculator! Results may not be reproducible on re-run
/tmp/ipykernel_2669/3396863997.py:11: DeprecationWarning: Please use atoms.calc = calc
slab.set_calculator(calc)
/home/runner/work/fairchem/fairchem/src/fairchem/core/trainers/ocp_trainer.py:472: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
with torch.cuda.amp.autocast(enabled=self.scaler is not None):
1.2127598762512206 1.2072552586719669e-06
1.2127611637115479
1.2127580642700195
1.2127602100372314
1.2127587795257568
1.212759256362915
1.212759017944336
1.2127611637115479
1.2127604484558105
1.2127618789672852
1.2127587795257568
The forces don’t sum to zero#
In DFT, the forces on all the atoms should sum to zero; otherwise, there is a net translational or rotational force present. This is not enforced in fairchem models. Instead, individual forces are predicted, with no constraint that they sum to zero. If the force predictions are very accurate, then they sum close to zero. You can further improve this if you subtract the mean force from each atom.
from fairchem.core.models.model_registry import model_name_to_local_file
checkpoint_path = model_name_to_local_file('EquiformerV2-31M-S2EF-OC20-All+MD', local_cache='/tmp/fairchem_checkpoints/')
from fairchem.core.common.relaxation.ase_utils import OCPCalculator
calc = OCPCalculator(checkpoint_path=checkpoint_path, cpu=True)
from ase.build import fcc111, add_adsorbate
from ase.optimize import BFGS
slab = fcc111('Pt', size=(2, 2, 5), vacuum=10.0)
add_adsorbate(slab, 'O', height=1.2, position='fcc')
slab.set_calculator(calc)
f = slab.get_forces()
f.sum(axis=0)
INFO:root:Checking local cache: /tmp/fairchem_checkpoints/ for model EquiformerV2-31M-S2EF-OC20-All+MD
/home/runner/work/fairchem/fairchem/src/fairchem/core/common/relaxation/ase_utils.py:191: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
checkpoint = torch.load(checkpoint_path, map_location=torch.device("cpu"))
WARNING:root:Detected old config, converting to new format. Consider updating to avoid potential incompatibilities.
INFO:root:amp: true
cmd:
checkpoint_dir: /home/runner/work/fairchem/fairchem/docs/core/checkpoints/2024-12-19-04-01-04
commit: 83e1a53
identifier: ''
logs_dir: /home/runner/work/fairchem/fairchem/docs/core/logs/wandb/2024-12-19-04-01-04
print_every: 100
results_dir: /home/runner/work/fairchem/fairchem/docs/core/results/2024-12-19-04-01-04
seed: null
timestamp_id: 2024-12-19-04-01-04
version: 1.4.0
dataset:
format: trajectory_lmdb_v2
grad_target_mean: 0.0
grad_target_std: 2.887317180633545
key_mapping:
force: forces
y: energy
normalize_labels: true
target_mean: -0.7554450631141663
target_std: 2.887317180633545
transforms:
normalizer:
energy:
mean: -0.7554450631141663
stdev: 2.887317180633545
forces:
mean: 0.0
stdev: 2.887317180633545
evaluation_metrics:
metrics:
energy:
- mae
forces:
- forcesx_mae
- forcesy_mae
- forcesz_mae
- mae
- cosine_similarity
- magnitude_error
misc:
- energy_forces_within_threshold
primary_metric: forces_mae
gp_gpus: null
gpus: 0
logger: wandb
loss_functions:
- energy:
coefficient: 4
fn: mae
- forces:
coefficient: 100
fn: l2mae
model:
alpha_drop: 0.1
attn_activation: silu
attn_alpha_channels: 64
attn_hidden_channels: 64
attn_value_channels: 16
distance_function: gaussian
drop_path_rate: 0.1
edge_channels: 128
ffn_activation: silu
ffn_hidden_channels: 128
grid_resolution: 18
lmax_list:
- 4
max_neighbors: 20
max_num_elements: 90
max_radius: 12.0
mmax_list:
- 2
name: equiformer_v2
norm_type: layer_norm_sh
num_distance_basis: 512
num_heads: 8
num_layers: 8
num_sphere_samples: 128
otf_graph: true
proj_drop: 0.0
regress_forces: true
sphere_channels: 128
use_atom_edge_embedding: true
use_gate_act: false
use_grid_mlp: true
use_pbc: true
use_s2_act_attn: false
weight_init: uniform
optim:
batch_size: 8
clip_grad_norm: 100
ema_decay: 0.999
energy_coefficient: 4
eval_batch_size: 8
eval_every: 10000
force_coefficient: 100
grad_accumulation_steps: 1
load_balancing: atoms
loss_energy: mae
loss_force: l2mae
lr_initial: 0.0004
max_epochs: 3
num_workers: 8
optimizer: AdamW
optimizer_params:
weight_decay: 0.001
scheduler: LambdaLR
scheduler_params:
epochs: 1009275
lambda_type: cosine
lr: 0.0004
lr_min_factor: 0.01
warmup_epochs: 3364.25
warmup_factor: 0.2
outputs:
energy:
level: system
forces:
eval_on_free_atoms: true
level: atom
train_on_free_atoms: true
relax_dataset: {}
slurm:
additional_parameters:
constraint: volta32gb
cpus_per_task: 9
folder: /checkpoint/abhshkdz/open-catalyst-project/logs/equiformer_v2/8307793
gpus_per_node: 8
job_id: '8307793'
job_name: eq2s_051701_allmd
mem: 480GB
nodes: 8
ntasks_per_node: 8
partition: learnaccel
time: 4320
task:
dataset: trajectory_lmdb_v2
eval_on_free_atoms: true
grad_input: atomic forces
labels:
- potential energy
primary_metric: forces_mae
train_on_free_atoms: true
test_dataset: {}
trainer: ocp
val_dataset: {}
INFO:root:Loading model: equiformer_v2
WARNING:root:equiformer_v2 (EquiformerV2) class is deprecated in favor of equiformer_v2_backbone_and_heads (EquiformerV2BackboneAndHeads)
INFO:root:Loaded EquiformerV2 with 31058690 parameters.
INFO:root:Loading checkpoint in inference-only mode, not loading keys associated with trainer state!
/home/runner/work/fairchem/fairchem/src/fairchem/core/modules/normalization/normalizer.py:69: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
"mean": torch.tensor(state_dict["mean"]),
WARNING:root:No seed has been set in modelcheckpoint or OCPCalculator! Results may not be reproducible on re-run
/tmp/ipykernel_2669/4037009387.py:11: DeprecationWarning: Please use atoms.calc = calc
slab.set_calculator(calc)
/home/runner/work/fairchem/fairchem/src/fairchem/core/trainers/ocp_trainer.py:472: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
with torch.cuda.amp.autocast(enabled=self.scaler is not None):
array([ 0.01598842, 0.00170618, -0.07197857], dtype=float32)
# This makes them sum closer to zero by removing net translational force
(f - f.mean(axis=0)).sum(axis=0)
array([-5.9488229e-08, -7.1711838e-08, 2.3841858e-07], dtype=float32)