Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Open Catalyst 2022 (OC22)

Dataset Overview
PropertyValue
Size~62K oxide systems
TasksS2EF-Total, IS2RE-Total, IS2RS
FocusOxide electrocatalysts
Energy TypeDFT total energies
PaperACS Catalysis 2023
LicenseCC-BY-4.0

Structure to Total Energy and Forces (S2EF-Total) task

For this task’s train, validation and test sets, we provide precomputed LMDBs that can be directly used with dataloaders provided in our code. The LMDBs contain input structures from all points in relaxation trajectories along with the energy of the structure and the atomic forces. The validation and test datasets are broken into subsplits based on in-distribution and out-of-distribution materials relative to the training dataset. All LMDBs are compressed into a single .tar.gz file.

SplitsSize of compressed version (in bytes)Size of uncompressed version (in bytes)MD5 checksum (download link)
Train (all splits) + Validation (all splits) + test (all splits)20G71Gebea523c6f8d61248a37b4dd660b11e6

Initial Structure to Relaxed Structure (IS2RS) and Initial Structure to Relaxed Total Energy (IS2RE-Total) tasks

For IS2RE-Total / IS2RS training, validation and test sets, we provide precomputed LMDBs that can be directly used with dataloaders provided in our code. The LMDBs contain input initial structures and the output relaxed structures and energies. The validation and test datasets are broken into subsplits based on in-distribution and out-of-distribution materials relative to the training dataset. All LMDBs are compressed into a single .tar.gz file.

SplitsSize of compressed version (in bytes)Size of uncompressed version (in bytes)MD5 checksum (download link)
Train (all splits) + Validation (all splits) + test (all splits)109M424Mb35dc24e99ef3aeaee6c5c949903de94

Relaxation Trajectories

System trajectories (optional download)

We provide relaxation trajectories for all systems used in train and validation sets of S2EF-Total and IS2RE-Total/RS task:

NumberSize of compressed version (in bytes)Size of uncompressed version (in bytes)MD5 checksum (download link)
S2EF and IS2RE (both train and validation)34G80G977b6be1cbac6864e63c4c7fbf8a3fce

OC22 Mappings

Data mapping information

We provide a Python pickle file containing information about the slab and adsorbates for each of the systems in OC22 dataset. Loading the pickle file will load a Python dictionary. The keys of this dictionary are the system-ids (of the format XYZ where XYZ is an integer, corresponding to the sid in the LMDB Data object), and the corresponding value of each key is a dictionary with information about:

Downloadable link: https://dl.fbaipublicfiles.com/opencatalystproject/data/oc22/oc22_metadata.pkl (MD5 checksum: 13dc06c6510346d8a7f614d5b26c8ffa )

An example adsorbate+slab entry:

 6877: {'bulk_id': 'mp-559112',
  'miller_index': (1, 0, 0),
  'nads': 1,
  'traj_id': 'K2Zn6O7_mp-559112_RyQXa0N0uc_ohyUKozY3G',
  'bulk_symbols': 'K4Zn12O14',
  'slab_sid': 30859,
  'ads_symbols': 'O2'},

An example slab entry:

 34815: {'bulk_id': 'mp-18793',
  'miller_index': (1, 2, 1),
  'nads': 0,
  'traj_id': 'LiCrO2_mp-18793_clean_3HDHBg6TIz',
  'bulk_symbols': 'Li2Cr2O4'},

OC20 reference information

In order to train models on OC20 total energy, we provide a Python pickle file containing the energy necessary to convert adsorption energy values to total energy. Loading the pickle file will load a Python dictionary. The keys of this dictionary are the system-ids (of the format random<XYZ> where XYZ is an integer, corresponding to the sid in the LMDB Data object), and the corresponding value of each key is the energy to be added to OC20 energy values. To train on total energies for OC20, specify the path to this pickle file in your training configs.

Downloadable link: https://dl.fbaipublicfiles.com/opencatalystproject/data/oc22/oc20_ref.pkl (MD5 checksum: 043e1e0b0cce64c62f01a8563dbc3178)

Citing OC22

The Open Catalyst 2022 (OC22) dataset is licensed under a Creative Commons Attribution 4.0 License.

Please consider citing the following paper in any research manuscript using the OC22 dataset:

@article{oc22_dataset,
    author = {Tran*, Richard and Lan*, Janice and Shuaibi*, Muhammed and Wood*, Brandon and Goyal*, Siddharth and Das, Abhishek and Heras-Domingo, Javier and Kolluru, Adeesh and Rizvi, Ammar and Shoghi, Nima and Sriram, Anuroop and Ulissi, Zachary and Zitnick, C. Lawrence},
    title = {The Open Catalyst 2022 (OC22) dataset and challenges for oxide electrocatalysts},
    journal = {ACS Catalysis},
    year={2023},
}
References
  1. Tran, R., Lan, J., Shuaibi, M., Wood, B. M., Goyal, S., Das, A., Heras-Domingo, J., Kolluru, A., Rizvi, A., Shoghi, N., Sriram, A., Therrien, F., Abed, J., Voznyy, O., Sargent, E. H., Ulissi, Z., & Zitnick, C. L. (2023). The Open Catalyst 2022 (OC22) Dataset and Challenges for Oxide Electrocatalysts. ACS Catalysis, 13(5), 3066–3084. 10.1021/acscatal.2c05426