Open Catalyst 2022 (OC22)#
Structure to Total Energy and Forces (S2EF-Total) task#
For this task’s train, validation and test sets, we provide precomputed LMDBs that can be directly used with dataloaders provided in our code. The LMDBs contain input structures from all points in relaxation trajectories along with the energy of the structure and the atomic forces. The validation and test datasets are broken into subsplits based on in-distribution and out-of-distribution materials relative to the training dataset. All LMDBs are compressed into a single .tar.gz
file.
Splits |
Size of compressed version (in bytes) |
Size of uncompressed version (in bytes) |
MD5 checksum (download link) |
---|---|---|---|
Train (all splits) + Validation (all splits) + test (all splits) |
20G |
71G |
|
Initial Structure to Relaxed Structure (IS2RS) and Initial Structure to Relaxed Total Energy (IS2RE-Total) tasks#
For IS2RE-Total / IS2RS training, validation and test sets, we provide precomputed LMDBs that can be directly used with dataloaders provided in our code. The LMDBs contain input initial structures and the output relaxed structures and energies. The validation and test datasets are broken into subsplits based on in-distribution and out-of-distribution materials relative to the training dataset. All LMDBs are compressed into a single .tar.gz
file.
Splits |
Size of compressed version (in bytes) |
Size of uncompressed version (in bytes) |
MD5 checksum (download link) |
---|---|---|---|
Train (all splits) + Validation (all splits) + test (all splits) |
109M |
424M |
|
Relaxation Trajectories#
System trajectories (optional download)#
We provide relaxation trajectories for all systems used in train and validation sets of S2EF-Total and IS2RE-Total/RS task:
Number |
Size of compressed version (in bytes) |
Size of uncompressed version (in bytes) |
MD5 checksum (download link) |
---|---|---|---|
S2EF and IS2RE (both train and validation) |
34G |
80G |
|
OC22 Mappings#
Data mapping information#
We provide a Python pickle file containing information about the slab and adsorbates for each of the systems in OC22 dataset. Loading the pickle file will load a Python dictionary. The keys of this dictionary are the system-ids (of the format XYZ
where XYZ
is an integer, corresponding to the sid
in the LMDB Data object), and the corresponding value of each key is a dictionary with information about:
bulk_id
: Materials Project ID of the bulk system used corresponding the the catalyst surfacebulk_symbols
: Chemical composition of the bulk counterpartmiller_index
: 3-tuple of integers indicating the Miller indices of the surfacetraj_id
: Identifier associated with the accompanying raw trajectory (if available)slab_sid
: Identifier associated with the corresponding slab (if available)ads_symbols
: Chemical composition of the adsorbate counterpart (adosrbate+slabs only)nads
: Number of adsorbates present
Downloadable link: https://dl.fbaipublicfiles.com/opencatalystproject/data/oc22/oc22_metadata.pkl (MD5 checksum: 13dc06c6510346d8a7f614d5b26c8ffa
)
An example adsorbate+slab entry:
6877: {'bulk_id': 'mp-559112',
'miller_index': (1, 0, 0),
'nads': 1,
'traj_id': 'K2Zn6O7_mp-559112_RyQXa0N0uc_ohyUKozY3G',
'bulk_symbols': 'K4Zn12O14',
'slab_sid': 30859,
'ads_symbols': 'O2'},
An example slab entry:
34815: {'bulk_id': 'mp-18793',
'miller_index': (1, 2, 1),
'nads': 0,
'traj_id': 'LiCrO2_mp-18793_clean_3HDHBg6TIz',
'bulk_symbols': 'Li2Cr2O4'},
#
OC20 reference information#
In order to train models on OC20 total energy, we provide a Python pickle file containing the energy necessary to convert adsorption energy values to total energy. Loading the pickle file will load a Python dictionary. The keys of this dictionary are the system-ids (of the format random<XYZ>
where XYZ
is an integer, corresponding to the sid
in the LMDB Data object), and the corresponding value of each key is the energy to be added to OC20 energy values. To train on total energies for OC20, specify the path to this pickle file in your training configs.
Downloadable link: https://dl.fbaipublicfiles.com/opencatalystproject/data/oc22/oc20_ref.pkl (MD5 checksum: 043e1e0b0cce64c62f01a8563dbc3178
)
#
Citing OC22#
The Open Catalyst 2022 (OC22) dataset is licensed under a Creative Commons Attribution 4.0 License.
Please consider citing the following paper in any research manuscript using the OC22 dataset:
@article{oc22_dataset,
author = {Tran*, Richard and Lan*, Janice and Shuaibi*, Muhammed and Wood*, Brandon and Goyal*, Siddharth and Das, Abhishek and Heras-Domingo, Javier and Kolluru, Adeesh and Rizvi, Ammar and Shoghi, Nima and Sriram, Anuroop and Ulissi, Zachary and Zitnick, C. Lawrence},
title = {The Open Catalyst 2022 (OC22) dataset and challenges for oxide electrocatalysts},
journal = {ACS Catalysis},
year={2023},
}