Open Catalyst 2022 (OC22)#

Structure to Total Energy and Forces (S2EF-Total) task#

For this task’s train, validation and test sets, we provide precomputed LMDBs that can be directly used with dataloaders provided in our code. The LMDBs contain input structures from all points in relaxation trajectories along with the energy of the structure and the atomic forces. The validation and test datasets are broken into subsplits based on in-distribution and out-of-distribution materials relative to the training dataset. All LMDBs are compressed into a single .tar.gz file.

Splits

Size of compressed version (in bytes)

Size of uncompressed version (in bytes)

MD5 checksum (download link)

Train (all splits) + Validation (all splits) + test (all splits)

20G

71G

ebea523c6f8d61248a37b4dd660b11e6

Initial Structure to Relaxed Structure (IS2RS) and Initial Structure to Relaxed Total Energy (IS2RE-Total) tasks#

For IS2RE-Total / IS2RS training, validation and test sets, we provide precomputed LMDBs that can be directly used with dataloaders provided in our code. The LMDBs contain input initial structures and the output relaxed structures and energies. The validation and test datasets are broken into subsplits based on in-distribution and out-of-distribution materials relative to the training dataset. All LMDBs are compressed into a single .tar.gz file.

Splits

Size of compressed version (in bytes)

Size of uncompressed version (in bytes)

MD5 checksum (download link)

Train (all splits) + Validation (all splits) + test (all splits)

109M

424M

b35dc24e99ef3aeaee6c5c949903de94

Relaxation Trajectories#

System trajectories (optional download)#

We provide relaxation trajectories for all systems used in train and validation sets of S2EF-Total and IS2RE-Total/RS task:

Number

Size of compressed version (in bytes)

Size of uncompressed version (in bytes)

MD5 checksum (download link)

S2EF and IS2RE (both train and validation)

34G

80G

977b6be1cbac6864e63c4c7fbf8a3fce

OC22 Mappings#

Data mapping information#

We provide a Python pickle file containing information about the slab and adsorbates for each of the systems in OC22 dataset. Loading the pickle file will load a Python dictionary. The keys of this dictionary are the system-ids (of the format XYZ where XYZ is an integer, corresponding to the sid in the LMDB Data object), and the corresponding value of each key is a dictionary with information about:

  • bulk_id: Materials Project ID of the bulk system used corresponding the the catalyst surface

  • bulk_symbols: Chemical composition of the bulk counterpart

  • miller_index: 3-tuple of integers indicating the Miller indices of the surface

  • traj_id: Identifier associated with the accompanying raw trajectory (if available)

  • slab_sid: Identifier associated with the corresponding slab (if available)

  • ads_symbols: Chemical composition of the adsorbate counterpart (adosrbate+slabs only)

  • nads: Number of adsorbates present

Downloadable link: https://dl.fbaipublicfiles.com/opencatalystproject/data/oc22/oc22_metadata.pkl (MD5 checksum: 13dc06c6510346d8a7f614d5b26c8ffa )

An example adsorbate+slab entry:

 6877: {'bulk_id': 'mp-559112',
  'miller_index': (1, 0, 0),
  'nads': 1,
  'traj_id': 'K2Zn6O7_mp-559112_RyQXa0N0uc_ohyUKozY3G',
  'bulk_symbols': 'K4Zn12O14',
  'slab_sid': 30859,
  'ads_symbols': 'O2'},

An example slab entry:

 34815: {'bulk_id': 'mp-18793',
  'miller_index': (1, 2, 1),
  'nads': 0,
  'traj_id': 'LiCrO2_mp-18793_clean_3HDHBg6TIz',
  'bulk_symbols': 'Li2Cr2O4'},

#

OC20 reference information#

In order to train models on OC20 total energy, we provide a Python pickle file containing the energy necessary to convert adsorption energy values to total energy. Loading the pickle file will load a Python dictionary. The keys of this dictionary are the system-ids (of the format random<XYZ> where XYZ is an integer, corresponding to the sid in the LMDB Data object), and the corresponding value of each key is the energy to be added to OC20 energy values. To train on total energies for OC20, specify the path to this pickle file in your training configs.

Downloadable link: https://dl.fbaipublicfiles.com/opencatalystproject/data/oc22/oc20_ref.pkl (MD5 checksum: 043e1e0b0cce64c62f01a8563dbc3178)

#

Citing OC22#

The Open Catalyst 2022 (OC22) dataset is licensed under a Creative Commons Attribution 4.0 License.

Please consider citing the following paper in any research manuscript using the OC22 dataset:

@article{oc22_dataset,
    author = {Tran*, Richard and Lan*, Janice and Shuaibi*, Muhammed and Wood*, Brandon and Goyal*, Siddharth and Das, Abhishek and Heras-Domingo, Javier and Kolluru, Adeesh and Rizvi, Ammar and Shoghi, Nima and Sriram, Anuroop and Ulissi, Zachary and Zitnick, C. Lawrence},
    title = {The Open Catalyst 2022 (OC22) dataset and challenges for oxide electrocatalysts},
    journal = {ACS Catalysis},
    year={2023},
}