Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Open Direct Air Capture 2023 (ODAC23)

Dataset Overview
PropertyValue
Size~1M structures (S2EF), ~10K structures (IS2RE)
DomainMetal-organic frameworks (MOFs) with CO2
LabelsTotal energy (eV), forces (eV/A)
Level of TheoryPBE+D3 (VASP)
LicenseCC BY 4.0
StatusDeprecated - use ODAC25

Structure to Energy and Forces (S2EF) task

We provide precomputed LMDBs for train, validation, and the various test sets that can be used directly with the dataloaders provided in our code. The LMDBs contain input structures from all points in relaxation trajectories along with the energy of the structure and the atomic forces. The dataset contains an in-domain test set and 4 out-of-domain test sets (ood-large, ood-linker, ood-topology, and ood-linker & topology). All LMDbs are compressed into a single .tar.gz file.

SplitsSize of compressed version (in bytes)Size of uncompressed version (in bytes)MD5 checksum (download link)
Train + Validation + Test (all splits)172G476G162f0660b2f1c9209c5b57f7b9e545a7

The train and val splits are also available in extxyz formats. Each trajectory is in stored in a different extxyz file.

SplitsSize of compressed version (in bytes)Size of uncompressed version (in bytes)MD5 checksum (download link)
Train232G781G381e72fd8b9c055065fd3afff6b0945b
Val5.1G18G09913759c6e0f8d649f7ec9dff9e0e8b

Initial Structure to Relaxed Structure (IS2RS) / Relaxed Energy (IS2RE) tasks

For IS2RE / IS2RS training, validation and test sets, we provide precomputed LMDBs that can be directly used with dataloaders provided in our code. The LMDBs contain input initial structures and the output relaxed structures and energies. The dataset contains an in-domain test set and 4 out-of-domain test sets (ood-large, ood-linker, ood-topology, and ood-linker & topology). All LMDBs are compressed into a single .tar.gz file.

SplitsSize of compressed version (in bytes)Size of uncompressed version (in bytes)MD5 checksum (download link)
Train + Validation + Test (all splits)809M2.2Gf7f2f58669a30abae8cb9ba1b7f2bcd2

DDEC Charges

We provide DDEC charges computed for all MOFs in the ODAC23 dataset. A small number of MOFs (~2%) are missing these charges because the DDEC calcuations failed for them.

Size of compressed version (in bytes)Size of uncompressed version (in bytes)MD5 checksum (download link)
147M534M81927b78d9e4184cc3c398e79760126a

Citing ODAC23

The OpenDAC 2023 (ODAC23) dataset is licensed under a Creative Commons Attribution 4.0 License.

Please consider citing the following paper in any research manuscript using the ODAC23 dataset:

@article{odac23_dataset,
    author = {Anuroop Sriram and Sihoon Choi and Xiaohan Yu and Logan M. Brabson and Abhishek Das and Zachary Ulissi and Matt Uyttendaele and Andrew J. Medford and David S. Sholl},
    title = {The Open DAC 2023 Dataset and Challenges for Sorbent Discovery in Direct Air Capture},
    year = {2023},
    journal={arXiv preprint arXiv:2311.00341},
}