Open Catalyst 2020 Dense (OC20Dense)#
Overview#
The OC20Dense dataset is a validation dataset which was used to assess model performance in AdsorbML: A Leap in Efficiency for Adsorption Energy Calculations using Generalizable Machine Learning Potentials. OC20-Dense contains a dense sampling of adsorbate configurations on ~1,000 randomly selected adsorbate+surface materials from the OC20 dataset. It comprises a total of 85,658 unique input configurations. This dataset, and the paper written for it, supports the determination of global minimum adsorbate-surface energies (the adsorption energy). This differs from OC20, which contains local adsorbate relaxations. Under low coverage conditions, the global minimum energy site is the most likely to be occupied. For computational catalysis research, we correlate the adsorption energy with important figures of merit, so aquisition of it is an important task.
File Contents and Download#
Splits |
Size of compressed version (in bytes) |
Size of uncompressed version (in bytes) |
MD5 checksum (download link) |
---|---|---|---|
LMDB |
654M |
9.8G |
|
ASE Trajectories |
29G |
112G |
The following files are also provided to be used for evaluation and general information:
oc20dense_mapping.pkl
: Mapping of the LMDBsid
to general metadata information. If this file is not present, run the commandpython src/fairchem/core/scripts/download_large_files.py adsorbml
from the root of the fairchem repo to download it. -system_id
: Unique system identifier for an adsorbate, bulk, surface combination.config_id
: Unique configuration identifier, whererand
andheur
correspond to random and heuristic initial configurations, respectively.mpid
: Materials Project bulk identifier.miller_idx
: 3-tuple of integers indicating the Miller indices of the surface.shift
: C-direction shift used to determine cutoff for the surface (c-direction is following the nomenclature from Pymatgen).top
: Boolean indicating whether the chosen surface was at the top or bottom of the originally enumerated surface.adsorbate
: Chemical composition of the adsorbate.adsorption_site
: A tuple of 3-tuples containing the Cartesian coordinates of each binding adsorbate atom
oc20dense_targets.pkl
: DFT adsorption energies across different system and placement ids.oc20dense_compute.pkl
: DFT compute as measured in the number of ionic and scf steps for each evaluated relaxation.oc20dense_ref_energies.pkl
: Reference energy used for a specifiedsystem_id
. This energy includes the relaxed clean surface and the gas phase adsorbate energy to ensure consistency across calculations.oc20dense_tags.pkl
: Tag information used for a specifiedsystem_id
. Where 0 = subsurface, 1 = surface, 2 = adsorbate.
All mappings can be obtained at the following downloadable link: https://dl.fbaipublicfiles.com/opencatalystproject/data/adsorbml/oc20_dense_mappings.tar.gz
MD5 checksums:
c18735c405ce6ce5761432b07287d8d9 oc20_dense_mappings.tar.gz
3e26c3bcef01ccfc9b001931065ea6e6 oc20dense_mapping.pkl
fd589b013b72e62e11a6b2a5bd1d323c oc20dense_targets.pkl
78d25997e0aaf754df526ab37276bb89 oc20dense_compute.pkl
b07c64158e4bfa5f7b9bf6263753ecc5 oc20dense_ref_energies.pkl
1ba0bc266130f186850f5faa547b6a02 oc20dense_tags.pkl