OC25#
The Open Catalyst 2025 (OC25) dataset consists of nearly 8 million DFT calculations across 1.5 million unique explicit solvent environments with system sizes of 144 atoms on average. This dataset represents the largest and most diverse solid-liquid interface dataset that is currently available and provides configurational and elemental diversity: spanning 88 elements, commonly used solvents/ions, varying solvent layers, and off-equilibrium sampling.
The dataset enables training of state-of-the-art machine-learned interatomic potentials for applications in electrocatalysis. All structures are labeled with total energies (eV) and forces (eV/Å) computed using VASP with the RPBE+D3 functional.
All information about the dataset is available at the OC25 Huggingface site. For questions or issues, please open a GitHub issue in this repository.
Dataset format#
The dataset is provided in ASE DB compatible lmdb files (*.aselmdb).
Citing OC25#
The OC25 dataset is licensed under a Creative Commons Attribution 4.0 License.
Please consider citing the following paper in any publications that uses this dataset:
@misc{oc25,
title={The Open Catalyst 2025 (OC25) Dataset and Models for Solid-Liquid Interfaces},
author={Sushree Jagriti Sahoo and Mikael Maraschin and Daniel S. Levine and Zachary Ulissi and C. Lawrence Zitnick and Joel B Varley and Joseph A. Gauthier and Nitish Govindarajan and Muhammed Shuaibi},
year={2025},
eprint={},
archivePrefix={arXiv},
primaryClass={},
url={},
}