Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Open Catalyst 2020 (OC20)

Dataset Overview
PropertyValue
Size133M+ DFT calculations
Systems~460K adsorbate-catalyst relaxations
TasksS2EF, IS2RE, IS2RS
Elements55 elements from periodic table
Adsorbates82 adsorbates
PaperACS Catalysis 2021
LicenseCC-BY-4.0

Download and preprocess the dataset

IS2* datasets are stored as LMDB files and are ready to be used upon download. S2EF train+val datasets require an additional preprocessing step.

For convenience, a self-contained script can be found here to download, preprocess, and organize the data directories to be readily usable by the existing configs.

For IS2*, run the script as:

python scripts/download_data.py --task is2re

For S2EF train/val, run the script as:

python scripts/download_data.py --task s2ef --split SPLIT_SIZE --get-edges --num-workers WORKERS --ref-energy

For S2EF test, run the script as:

python scripts/download_data.py --task s2ef --split test

To download and process the dataset in a directory other than your local fairchem/data folder, add the following command line argument --data-path.

Note that the baseline configs. expect the data to be found in fairchem/data, make sure you symlink your directory or modify the paths in the configs accordingly.

The following sections list dataset download links and sizes for various S2EF and IS2RE/IS2RS task splits. If you used the above download_data.py script to download and preprocess the data, you are good to go and can stop reading here!

Structure to Energy and Forces (S2EF) task

For this task’s train and validation sets, we provide compressed trajectory files with the input structures and output energies and forces. We provide precomputed LMDBs for the test sets. To use the train and validation datasets, first download the files and uncompress them. The uncompressed files are used to generate LMDBs, which are in turn used by the dataloaders to train the ML models. Code for the dataloaders and generating the LMDBs may be found in the Github repository.

Four training datasets are provided with different sizes. Each is a subset of the other, i.e., the 2M dataset is contained in the 20M and all datasets.

Four datasets are provided for validation set. Each dataset corresponds to a subsplit used to evaluate different types of extrapolation, in domain (id, same distribution as the training dataset), out of domain adsorbate (ood_ads, unseen adsorbate), out of domain catalyst (ood_cat, unseen catalyst composition), and out of domain both (ood_both, unseen adsorbate and catalyst composition).

For the test sets, we provide precomputed LMDBs for each of the 4 subsplits (In Domain, OOD Adsorbate, OOD Catalyst, OOD Both).

Each tarball has a README file containing details about file formats, number of structures / trajectories, etc.

SplitsSize of compressed version (in bytes)Size of uncompressed version (in bytes)MD5 checksum (download link)
Train
all225G1.1T12a7087bfd189a06ccbec9bc7add2bcd
20M34G165G863bc983245ffc0285305a1850e19cf7
2M3.4G17G953474cb93f0b08cdc523399f03f7c36
200K344M1.7Gf8d0909c2623a393148435dede7d3a46
Validation
val_id1.7G8.3Gf57f7f5c1302637940f2cc858e789410
val_ood_ads1.7G8.2G431ab0d7557a4639605ba8b67793f053
val_ood_cat1.7G8.3G532d6cd1fe541a0ddb0aa0f99962b7db
val_ood_both1.9G9.5G5731862978d80502bbf7017d68c2c729
Test (LMDBs for all splits)30G415Gbcada432482f6e87b24e14b6b744992a
Rattled data29G136G40431149b27b64ce1fb40cac4e2e064b
MD data42G306G9fed845aaab8fb4bf85e3a8db57796e0

Initial Structure to Relaxed Structure (IS2RS) and Initial Structure to Relaxed Energy (IS2RE) tasks

For the IS2RS and IS2RE tasks, we are providing:

Each tarball has README file containing details about file formats, number of structures / trajectories, etc.

SplitsSize of compressed version (in bytes)Size of uncompressed version (in bytes)MD5 checksum (download link)
Train (all splits) + Validation (all splits) + test (all splits)8.1G97Gcfc04dd2f87b4102ab2f607240d25fb1
Test-challenge 2021 (challenge details)1.3G17Gaed414cdd240fbb5670b5de6887a138b

Relaxation Trajectories

Adsorbate+catalyst system trajectories (optional download)

SplitSize of compressed version (in bytes)Size of uncompressed version (in bytes)MD5 checksum (download link)
All IS2RE/S training (~466k trajectories)109G841G9e3ed4d1e497bfdce4472ee70455edef
IS2RE/S Validation
val_id (~25K trajectories)5.9G46Gfcb71363018fb1e7127db2500e39e11a
val_ood_ads (~25K trajectories)5.7G44G5ced8ea84584aa229d31e693e0fb090f
val_ood_cat (~25K trajectories)6.0G46G88dcc02fd8c174a72d2c416878fc44ff
val_ood_both (~25K trajectories)4.4G35Gbc74b6474a13542cc56eaa97bd51adfc
Per-adsorbate trajectories (optional download)

Download links are in the table below:

Adsorbate symbolDownloadable pathsizeMD5 checksum
*Ohttps://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/0.tar1006Md4151542856b4b6405f276808f75358a
*Hhttps://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/1.tar850M3697f04faf04251a23da8b88a78209f7
*OHhttps://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/2.tar1.6Ga21081f3f55eb0c98a91021bbe3dac44
*OH2https://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/3.tar1.8Gb12b706854f5d899e02a9ae6578b5d45
*Chttps://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/4.tar1.1Ge4fe9890764fcf59e01e3ceab089b978
*CHhttps://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/6.tar1.4Gec9aa2c4c4bd4419359438ba7fbb881d
*CHOhttps://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/7.tar1.4Gd32200f74ad5c3bfd42e8835f36d57ab
*COHhttps://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/8.tar1.6G5418a1b331f6c7689a5405cca4cc8d15
*CH2https://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/9.tar1.6G8ee1066149c305d7c17c219b369c5a73
CH2Ohttps://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/10.tar1.7G960c2450814024b66f3c79121179ac60
*CHOHhttps://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/11.tar1.8G60ac9f965f9589a3389483e3d1e58144
*CH3https://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/12.tar1.7G7e123e6f4fb10d6897be3f47721dfd4a
*OCH3https://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/13.tar1.8G0823047bbbe05fa0e63f9d83ec601487
*CH2OHhttps://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/14.tar1.9G9ac71e198d75b1427182cd34abb73e4d
*CH4https://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/15.tar1.9Ga405ce403018bf8afbd4425d5c0b34d5
*OHCH3https://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/16.tar2.1Gd3c829f1952db6e4f428273ee05f59b1
CChttps://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/17.tar1.5Gd687a151345305897b9245af4b0f9967
*CCOhttps://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/18.tar1.7G214ca96e620c5ec6e8a6ff8144a22a04
*CCHhttps://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/19.tar1.6Gda2268545e80ca1664026449dd2fdd24
*CHCOhttps://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/20.tar1.7G386c99407fe63080d26cda525dfdd8cd
*CCHOhttps://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/21.tar1.8G918b20960438494ab160a9dbd9668157
*COCHOhttps://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/22.tar1.8G84424aa2ad30301e23ece1438ea39923
*CCHOHhttps://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/23.tar2.0G3cc90425ec042a70085ba7eb2916a79a
*CCH2https://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/24.tar1.8G9dbcf7566e40965dd7f8a186a75a718e
CHCHhttps://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/25.tar1.7Ga193b4c72f915ba0b21a41790696b23c
CH2*COhttps://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/26.tar1.8Gde83cf50247f5556fa4f9f64beff1eeb
*CHCHOhttps://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/27.tar1.9G1d140aaa2e7b287124ab38911a711d70
CHCOHhttps://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/28.tar1.3G682d8a6b05ca5948b34dc5e5f6bbcd61
*COCH2Ohttps://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/29.tar1.9Gc8742faa8ca40e8edb4110069817fa70
CHOCHOhttps://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/30.tar2.0G8cfbb67beb312b98c40fcb891dfa480a
*COHCHOhttps://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/31.tar1.9G6ffa903a62d8ec3319ecec6a03b06276
*COHCOHhttps://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/32.tar2.0Gcaca0058b641bfdc9f8de4527e60feb7
*CCH3https://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/33.tar1.8G906543aaefc171edab388ff4f0fe8a20
*CHCH2https://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/34.tar1.8G4dfab479495f76179749c1956046fbd8
*COCH3https://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/35.tar1.9G29d1b992715054e920e8bb2afe97b393
*CHCHOHhttps://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/38.tar2.0G9e5912df6f7b11706d1046cdb9e3087e
*CCH2OHhttps://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/39.tar2.1G7bcae43cee451306e34ec416588a7f09
*CHOCHOHhttps://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/40.tar2.0Gf98866d08fe3451ae7ebc47bb51599aa
*COCH2OHhttps://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/41.tar1.4Gbfaf689e5827fcf26c51e567bb8dd1be
*COHCHOHhttps://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/42.tar2.0G236fe4e950aa2fbdde94ef2821fb48d2
*OCHCH3https://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/44.tar2.1G66acc5460a999625c3364f0f3bcca871
*COHCH3https://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/45.tar2.1Gbb4a01956736399c8cee5e219f8c1229
*CHOHCH2https://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/46.tar2.1Ge836de4ec146b1b611533f1ef682cace
*CHCH2OHhttps://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/47.tar2.0G66df44121806debef6dc038df7115d1d
*OCH2CHOHhttps://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/48.tar2.2Gff6981fdbcd2e65d351505c15d218d76
*CHOCH2OHhttps://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/49.tar2.1G448f7d352ab6e32f754e24de64ca302a
*COHCH2OHhttps://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/50.tar2.1G8bff6bf3e10cc84acc4a283a375fcc23
*CHOHCHOHhttps://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/51.tar2.0G9c9e4d617d306751760a80f1453e71f1
*CH2CH3https://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/52.tar2.0Gec1e964d2ee6f468fa5773743e3994a4
*OCH2CH3https://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/53.tar2.1Gd297b27b02822f9b6af80bdb64aee819
*CHOHCH3https://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/54.tar2.1G368de083dafdc3bbdb560d35e2a102c0
*CH2CH2OHhttps://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/55.tar2.1G3c1aaf790659f7ff89bf1eed8b396b63
*CHOHCH2OHhttps://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/56.tar2.2G2d71adb9e305e6f3bca49e5df9b5a86a
*OHCH2CH3https://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/57.tar2.3Gcf51128f8522b7b66fc68d79980d6def
*NH2N(CH3)2https://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/58.tar1.6G36ba974d80c20ff636431f7c0ad225da
*ONN(CH3)2https://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/59.tar2.3Gfdc4cd19977496909d61be4aee61c4f1
*OHNNCH3https://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/60.tar2.1G50a6ff098f9ba7adbba9ac115726cc5a
*ONHhttps://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/62.tar1.8G47573199c545afe46c554ff756c3e38f
*NHNHhttps://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/63.tar1.7Gdd456b7e19ef592d9f0308d911b91d7c
NNHhttps://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/65.tar1.6Gc05289fd56d64c74306ebf57f1061318
*NO2NO2https://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/67.tar2.1G4822a06f6c5f41bdefd3cbbd8856c11f
NNOhttps://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/68.tar1.6G2a27de122d32917cc5b6ac0a21c63c1c
*N2https://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/69.tar1.5Gcc668fecf679b6edaac8fd8fb9cdd404
*ONNH2https://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/70.tar2.1Gdff880f1a5baa7f67b52fd3ed745443d
*NH2https://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/71.tar1.6Gc7f383b50faa6244e265c9611466cb8f
*NH3https://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/72.tar1.9G2b355741f9300445703270e0e4b8c01c
*NONHhttps://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/73.tar1.8G48877a0c6f2994baac82cb722711aaa2
*NHhttps://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/74.tar1.4G7979b9e7ab557d6979b33e352486f0ef
*NO2https://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/75.tar1.7G9f352fbc32bb2b8caf4788aba28b2eb7
*NOhttps://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/76.tar1.4G482ee306a5ae2eee78cac40d10059ebc
*Nhttps://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/77.tar1.1Gbfb6e03d4a687987ff68976f0793cc46
*NO3https://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/78.tar1.8G700834326e789a6e38bf3922d9fcb792
*OHNH2https://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/79.tar2.1Gfa24472e0c02c34d91f3ffe6b77bfb11
*ONOHhttps://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/80.tar1.4G4ddcccd62a834a76fe6167461f512529
*CNhttps://dl.fbaipublicfiles.com/opencatalystproject/data/per_adsorbate_is2res/81.tar1.5Gbc7c55330ece006d09496a5ff01d5d50

Note - A few adsorbates are intentionally left out for the test splits.

Downloading any of the above and extracting will result in a folder :

<index>/

where, <index> can be 0 to 81. N is dependent on which adsorbate index is chosen.

The file system.txt has information in the following format: system_id,reference_energy

where:

The .extxyz.xz files are LZMA compressed .extxyz trajectory files. Each trajectory corresponds to a relaxation trajectory of a different adsorbate+catalyst system. Information about the .extxyz trajectory file format may be found at https://wiki.fysik.dtu.dk/ase/dev/ase/io/formatoptions.html#extxyz .

In order to uncompress the files, uncompress.py provides a multi-core implementation which could be used.

Catalyst system trajectories (optional download)

NumberSize of compressed version (in bytes)Size of uncompressed version (in bytes)MD5 checksum (download link)
294k systems20G151G347f4183465810e9b384e7a033baefc7

Bader charge data

We provide Bader charge data for all final frames of our train + validation systems in OC20 (for both S2EF and IS2RE/RS tasks). A .tar.gz file, when downloaded and uncompressed will contain several directories with unique system-ids (of the format random<XYZ> where XYZ is an integer). Each directory will contain raw Bader charge analysis outputs. For more details on the Bader charge analysis, see https://theory.cm.utexas.edu/henkelman/research/bader/.

Downloadable link: https://dl.fbaipublicfiles.com/opencatalystproject/data/oc20_bader_data.tar (MD5 checksum: aecc5e23542de49beceb4b7e44c153b9)

OC20 mappings

Data mapping information

We provide a Python pickle file containing information about the slab and adsorbates for each of the systems in OC20 dataset. Loading the pickle file will load a Python dictionary. The keys of this dictionary are the adsorbate+catalyst system-ids (of the format random<XYZ> where XYZ is an integer), and the corresponding value of each key is a dictionary with information about:

Downloadable link: https://dl.fbaipublicfiles.com/opencatalystproject/data/oc20_data_mapping.pkl (MD5 checksum: 6b5d485019861f6e7efca38338375b61)

An example entry is

'random2181546': {'bulk_id': 6510,
  'ads_id': 69,
  'bulk_mpid': 'mp-22179',
  'bulk_symbols': 'Si2Ti2Y2',
  'ads_symbols': '*N2',
  'miller_index': (2, 0, 1),
  'shift': 0.145,
  'top': True,
  'adsorption_site': ((4.5, 12.85, 16.13),),
  'class': 1,
  'anomaly': 0}

Adsorbate-catalyst system to catalyst system mapping information

We provide a Python pickle file containing information about the mapping from adsorbate-catalyst systems to their corresponding catalyst systems. Loading the pickle file will load a Python dictionary. The keys of this dictionary are the adsorbate+catalyst system-ids (of the format random<XYZ> where XYZ is an integer), and values will be the catalyst system-ids (of the format random<PQR> where PQR is an integer).

Downloadable link: https://dl.fbaipublicfiles.com/opencatalystproject/data/mapping_adslab_slab.pkl (MD5 checksum: 079041076c3f15d18ecb5d17c509cdfe)

An example entry is

'random1981709': 'random533137'

Dataset changelog

September 2021

March 2021

Version 2, Feb 2021

Modifications:

Below are actual updates numbers, of the form oldnew

Total S2EF frames:

Total IS2RE and IS2RS systems:

Version 1, Oct 2020

Total S2EF frames:

Total IS2RE and IS2RS systems:

Citing OC20

The Open Catalyst 2020 (OC20) dataset is licensed under a Creative Commons Attribution 4.0 License.

Please consider citing the following paper in any research manuscript using the OC20 dataset:

@article{ocp_dataset,
    author = {Chanussot*, Lowik and Das*, Abhishek and Goyal*, Siddharth and Lavril*, Thibaut and Shuaibi*, Muhammed and Riviere, Morgane and Tran, Kevin and Heras-Domingo, Javier and Ho, Caleb and Hu, Weihua and Palizhati, Aini and Sriram, Anuroop and Wood, Brandon and Yoon, Junwoong and Parikh, Devi and Zitnick, C. Lawrence and Ulissi, Zachary},
    title = {Open Catalyst 2020 (OC20) Dataset and Community Challenges},
    journal = {ACS Catalysis},
    year = {2021},
    doi = {10.1021/acscatal.0c04525},
}

Per-adsorbate trajectories

Adsorbate symbolSizeMD5 checksum (download link)
*O1006Md4151542856b4b6405f276808f75358a
*H850M3697f04faf04251a23da8b88a78209f7
*OH1.6Ga21081f3f55eb0c98a91021bbe3dac44
*OH21.8Gb12b706854f5d899e02a9ae6578b5d45
*C1.1Ge4fe9890764fcf59e01e3ceab089b978
*CH1.4Gec9aa2c4c4bd4419359438ba7fbb881d
*CHO1.4Gd32200f74ad5c3bfd42e8835f36d57ab
*COH1.6G5418a1b331f6c7689a5405cca4cc8d15
*CH21.6G8ee1066149c305d7c17c219b369c5a73
CH2O1.7G960c2450814024b66f3c79121179ac60
*CHOH1.8G60ac9f965f9589a3389483e3d1e58144
*CH31.7G7e123e6f4fb10d6897be3f47721dfd4a
*OCH31.8G0823047bbbe05fa0e63f9d83ec601487
*CH2OH1.9G9ac71e198d75b1427182cd34abb73e4d
*CH41.9Ga405ce403018bf8afbd4425d5c0b34d5
*OHCH32.1Gd3c829f1952db6e4f428273ee05f59b1
CC1.5Gd687a151345305897b9245af4b0f9967
*CCO1.7G214ca96e620c5ec6e8a6ff8144a22a04
*CCH1.6Gda2268545e80ca1664026449dd2fdd24
*CHCO1.7G386c99407fe63080d26cda525dfdd8cd
*CCHO1.8G918b20960438494ab160a9dbd9668157
*COCHO1.8G84424aa2ad30301e23ece1438ea39923
*CCHOH2.0G3cc90425ec042a70085ba7eb2916a79a
*CCH21.8G9dbcf7566e40965dd7f8a186a75a718e
CHCH1.7Ga193b4c72f915ba0b21a41790696b23c
CH2*CO1.8Gde83cf50247f5556fa4f9f64beff1eeb
*CHCHO1.9G1d140aaa2e7b287124ab38911a711d70
CHCOH1.3G682d8a6b05ca5948b34dc5e5f6bbcd61
*COCH2O1.9Gc8742faa8ca40e8edb4110069817fa70
CHOCHO2.0G8cfbb67beb312b98c40fcb891dfa480a
*COHCHO1.9G6ffa903a62d8ec3319ecec6a03b06276
*COHCOH2.0Gcaca0058b641bfdc9f8de4527e60feb7
*CCH31.8G906543aaefc171edab388ff4f0fe8a20
*CHCH21.8G4dfab479495f76179749c1956046fbd8
*COCH31.9G29d1b992715054e920e8bb2afe97b393
*CHCHOH2.0G9e5912df6f7b11706d1046cdb9e3087e
*CCH2OH2.1G7bcae43cee451306e34ec416588a7f09
*CHOCHOH2.0Gf98866d08fe3451ae7ebc47bb51599aa
*COCH2OH1.4Gbfaf689e5827fcf26c51e567bb8dd1be
*COHCHOH2.0G236fe4e950aa2fbdde94ef2821fb48d2
*OCHCH32.1G66acc5460a999625c3364f0f3bcca871
*COHCH32.1Gbb4a01956736399c8cee5e219f8c1229
*CHOHCH22.1Ge836de4ec146b1b611533f1ef682cace
*CHCH2OH2.0G66df44121806debef6dc038df7115d1d
*OCH2CHOH2.2Gff6981fdbcd2e65d351505c15d218d76
*CHOCH2OH2.1G448f7d352ab6e32f754e24de64ca302a
*COHCH2OH2.1G8bff6bf3e10cc84acc4a283a375fcc23
*CHOHCHOH2.0G9c9e4d617d306751760a80f1453e71f1
*CH2CH32.0Gec1e964d2ee6f468fa5773743e3994a4
*OCH2CH32.1Gd297b27b02822f9b6af80bdb64aee819
*CHOHCH32.1G368de083dafdc3bbdb560d35e2a102c0
*CH2CH2OH2.1G3c1aaf790659f7ff89bf1eed8b396b63
*CHOHCH2OH2.2G2d71adb9e305e6f3bca49e5df9b5a86a
*OHCH2CH32.3Gcf51128f8522b7b66fc68d79980d6def
*NH2N(CH3)21.6G36ba974d80c20ff636431f7c0ad225da
*ONN(CH3)22.3Gfdc4cd19977496909d61be4aee61c4f1
*OHNNCH32.1G50a6ff098f9ba7adbba9ac115726cc5a
*ONH1.8G47573199c545afe46c554ff756c3e38f
*NHNH1.7Gdd456b7e19ef592d9f0308d911b91d7c
NNH1.6Gc05289fd56d64c74306ebf57f1061318
*NO2NO22.1G4822a06f6c5f41bdefd3cbbd8856c11f
NNO1.6G2a27de122d32917cc5b6ac0a21c63c1c
*N21.5Gcc668fecf679b6edaac8fd8fb9cdd404
*ONNH22.1Gdff880f1a5baa7f67b52fd3ed745443d
*NH21.6Gc7f383b50faa6244e265c9611466cb8f
*NH31.9G2b355741f9300445703270e0e4b8c01c
*NONH1.8G48877a0c6f2994baac82cb722711aaa2
*NH1.4G7979b9e7ab557d6979b33e352486f0ef
*NO21.7G9f352fbc32bb2b8caf4788aba28b2eb7
*NO1.4G482ee306a5ae2eee78cac40d10059ebc
*N1.1Gbfb6e03d4a687987ff68976f0793cc46
*NO31.8G700834326e789a6e38bf3922d9fcb792
*OHNH22.1Gfa24472e0c02c34d91f3ffe6b77bfb11
*ONOH1.4G4ddcccd62a834a76fe6167461f512529
*CN1.5Gbc7c55330ece006d09496a5ff01d5d50

Note - A few adsorbates are intentionally left out for the test splits.

Downloading any of the above and extracting will result in a folder:

<index>/

where, <index> can be 0 to 81. N is dependent on which adsorbate index is chosen.

The file system.txt has information in the following format: system_id,reference_energy

where:

The .extxyz.xz files are LZMA compressed .extxyz trajectory files. Each trajectory corresponds to a relaxation trajectory of a different adsorbate+catalyst system. Information about the .extxyz trajectory file format may be found at https://wiki.fysik.dtu.dk/ase/ase/io/formatoptions.html#extxyz.

In order to uncompress the files, uncompress.py provides a multi-core implementation which could be used.

References
  1. Chanussot, L., Das, A., Goyal, S., Lavril, T., Shuaibi, M., Riviere, M., Tran, K., Heras-Domingo, J., Ho, C., Hu, W., Palizhati, A., Sriram, A., Wood, B., Yoon, J., Parikh, D., Zitnick, C. L., & Ulissi, Z. (2021). Open Catalyst 2020 (OC20) Dataset and Community Challenges. ACS Catalysis, 11(10), 6059–6072. 10.1021/acscatal.0c04525