OC25

OC25#

The Open Catalyst 2025 (OC25) dataset consists of nearly 8 million DFT calculations across 1.5 million unique explicit solvent environments with system sizes of 144 atoms on average. This dataset represents the largest and most diverse solid-liquid interface dataset that is currently available and provides configurational and elemental diversity: spanning 88 elements, commonly used solvents/ions, varying solvent layers, and off-equilibrium sampling.

The dataset enables training of state-of-the-art machine-learned interatomic potentials for applications in electrocatalysis. All structures are labeled with total energies (eV) and forces (eV/Å) computed using VASP with the RPBE+D3 functional.

All information about the dataset is available at the OC25 Huggingface site. For questions or issues, please open a GitHub issue in this repository.

Dataset format#

The dataset is provided in ASE DB compatible lmdb files (*.aselmdb).

Citing OC25#

The OC25 dataset is licensed under a Creative Commons Attribution 4.0 License.

Please consider citing the following paper in any publications that uses this dataset:

@misc{oc25,
    title={The Open Catalyst 2025 (OC25) Dataset and Models for Solid-Liquid Interfaces},
    author={Sushree Jagriti Sahoo and Mikael Maraschin and Daniel S. Levine and Zachary Ulissi and C. Lawrence Zitnick and Joel B Varley and Joseph A. Gauthier and Nitish Govindarajan and Muhammed Shuaibi},
    year={2025},
    eprint={},
    archivePrefix={arXiv},
    primaryClass={},
    url={},
}