Open Catalyst 2020 Nudged Elastic Band (OC20NEB)

Open Catalyst 2020 Nudged Elastic Band (OC20NEB)#

Overview#

This is a validation dataset which was used to assess model performance in CatTSunami: Accelerating Transition State Energy Calculations with Pre-trained Graph Neural Networks. It is comprised of 932 NEB relaxation trajectories. There are three different types of reactions represented: desorptions, dissociations, and transfers. NEB calculations allow us to find transition states. The rate of reaction is determined by the transition state energy, so access to transition states is very important for catalysis research. For more information, check out the paper.

File Structure and Contents#

The tar file contains 3 subdirectories: dissociations, desorptions, and transfers. As the names imply, these directories contain the converged DFT trajectories for each of the reaction classes. Within these directories, the trajectories are named to identify the contents of the file. Here is an example and the anatomy of the name:

desorption_id_83_2409_9_111-4_neb1.0.traj

desorption indicates the reaction type (dissociation and transfer are the other possibilities)
id identifies that the material belongs to the validation in domain split (ood - out of domain is th e other possibility)
83 is the task id. This does not provide relavent information
2409 is the bulk index of the bulk used in the ocdata bulk pickle file
9 is the reaction index. for each reaction type there is a reaction pickle file in the repository. In this case it is the 9th entry to that pickle file
111-4 the first 3 numbers are the miller indices (i.e. the (1,1,1) surface), and the last number cooresponds to the shift value. In this case the 4th shift enumerated was the one used.
neb1.0 the number here indicates the k value used. For the full dataset, 1.0 was used so this does not distiguish any of the trajectories from one another.

The content of these trajectory files is the repeating frame sets. Despite the initial and final frames not being optimized during the NEB, the initial and final frames are saved for every iteration in the trajectory. For the dataset, 10 frames were used - 8 which were optimized over the neb. So the length of the trajectory is the number of iterations (N) * 10. If you wanted to look at the frame set prior to optimization and the optimized frame set, you could get them like this:

from __future__ import annotations

!wget https://dl.fbaipublicfiles.com/opencatalystproject/data/large_files/desorption_id_83_2409_9_111-4_neb1.0.traj

from ase.io import read

traj = read("desorption_id_83_2409_9_111-4_neb1.0.traj", ":")
unrelaxed_frames = traj[0:10]
relaxed_frames = traj[-10:]

--2025-07-01 20:11:20--  https://dl.fbaipublicfiles.com/opencatalystproject/data/large_files/desorption_id_83_2409_9_111-4_neb1.0.traj
Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 

18.155.192.35, 18.155.192.72, 18.155.192.114, ...
Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|18.155.192.35|:443... connected.
HTTP request sent, awaiting response... 

200 OK
Length: 10074935 (9.6M) [binary/octet-stream]
Saving to: ‘desorption_id_83_2409_9_111-4_neb1.0.traj’


          desorptio   0%[                    ]       0  --.-KB/s               

desorption_id_83_24 100%[===================>]   9.61M  48.1MB/s    in 0.2s    

2025-07-01 20:11:20 (48.1 MB/s) - ‘desorption_id_83_2409_9_111-4_neb1.0.traj’ saved [10074935/10074935]

Download#

Splits	Size of compressed version (in bytes)	Size of uncompressed version (in bytes)	MD5 checksum (download link)
ASE Trajectories	1.5G	6.3G	52af34a93758c82fae951e52af445089

Use#

One more note: We have not prepared an lmdb for this dataset. This is because it is NEB calculations are not supported directly in ocp. You must use the ase native OCP class along with ase infrastructure to run NEB calculations. Here is an example of a use:

import os 

from ase.io import read
from ase.mep import DyNEB
from ase.optimize import BFGS
from fairchem.core import FAIRChemCalculator, pretrained_mlip

traj = read("desorption_id_83_2409_9_111-4_neb1.0.traj", ":")
images = traj[0:10]
predictor = pretrained_mlip.get_predict_unit("uma-s-1")

neb = DyNEB(images, k=1)
for image in images:
    image.calc = FAIRChemCalculator(predictor, task_name="oc20")

optimizer = BFGS(
    neb,
    trajectory="neb.traj",
)

# Use a small number of steps here to keep the docs fast during CI, but otherwise do quite reasonable settings.
fast_docs = os.environ.get("FAST_DOCS", "false").lower() == "true"
if fast_docs:
    optimization_steps = 20
else:
    optimization_steps = 300

conv = optimizer.run(fmax=0.45, steps=optimization_steps)
if conv:
    neb.climb = True
    conv = optimizer.run(fmax=0.05, steps=optimization_steps)

/home/runner/work/_tool/Python/3.12.11/x64/lib/python3.12/site-packages/torchtnt/utils/version.py:12: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  import pkg_resources

WARNING:root:device was not explicitly set, using device='cuda'.

      Step     Time          Energy          fmax
BFGS:    0 20:11:36     -305.751997        5.179585

BFGS:    1 20:11:37     -305.678596       11.366769

BFGS:    2 20:11:38     -305.904276        1.882657

BFGS:    3 20:11:39     -305.919912        2.602309

BFGS:    4 20:11:41     -305.997625        2.277944

BFGS:    5 20:11:42     -305.991825        6.859284

BFGS:    6 20:11:43     -306.241676        9.558002

BFGS:    7 20:11:44     -306.214928        3.379100

BFGS:    8 20:11:45     -306.278919        4.740389

BFGS:    9 20:11:46     -306.303234        0.742360

BFGS:   10 20:11:48     -306.317829        0.645260

BFGS:   11 20:11:49     -306.347280        1.625512

BFGS:   12 20:11:50     -306.403408        2.014757

BFGS:   13 20:11:51     -306.437916        0.595279

BFGS:   14 20:11:52     -306.408169        0.702886

BFGS:   15 20:11:53     -306.445417        1.371093

BFGS:   16 20:11:54     -306.486254        0.860171

BFGS:   17 20:11:56     -306.484566        0.428249

BFGS:   18 20:11:57     -306.282099        8.999253

BFGS:   19 20:11:58     -306.460195        1.420460

BFGS:   20 20:11:59     -306.462856        1.054620

BFGS:   21 20:12:00     -306.348940        2.319915

BFGS:   22 20:12:01     -306.265251        3.804882

BFGS:   23 20:12:03     -306.288505        1.436674

BFGS:   24 20:12:04     -306.177399        3.162593

BFGS:   25 20:12:05     -305.946573        5.929643

BFGS:   26 20:12:06     -305.689093        6.935770

BFGS:   27 20:12:07     -305.423643        6.731104

BFGS:   28 20:12:09     -305.160787        5.664719

BFGS:   29 20:12:10     -304.897379        4.218149

BFGS:   30 20:12:11     -304.619451        2.904649

BFGS:   31 20:12:12     -304.338980        2.870969

BFGS:   32 20:12:13     -304.094957        2.722095

BFGS:   33 20:12:15     -303.909826        2.858228

BFGS:   34 20:12:16     -303.791689        4.177563

BFGS:   35 20:12:17     -303.683710        4.988073

BFGS:   36 20:12:18     -303.569067        5.477739

BFGS:   37 20:12:19     -303.406364        5.737056