OC20 Leaderboard - FAIR Chemistry Documentation

This leaderboard evaluates performance on the Open Catalyst 2020 (OC20) dataset - a large-scale dataset for catalyst discovery containing DFT relaxations across a wide variety of adsorbate-catalyst combinations. See the OC20 paper for more details.

The leaderboard supports two tasks:

S2EF (Structure to Energy and Forces): Predict energy and per-atom forces given an atomic structure.
IS2RE (Initial Structure to Relaxed Energy): Predict the relaxed energy given the initial structure.

Both tasks are evaluated across four test splits:

ID: In-domain test set
OOD-Ads: Out-of-domain adsorbates
OOD-Cat: Out-of-domain catalysts
OOD-Both: Out-of-domain adsorbates and catalysts

Download¶

Benchmarks	URL
S2EF	Test
IS2RE	Train+Val+Test

Install the necessary packages¶

pip install "fairchem-core>=2.5.0"

S2EF¶

Predictions must be saved as “.npz” files containing the following keys for each split (id, ood_ads, ood_cat, ood_both):

{split}_ids <class 'numpy.ndarray'>
{split}_energy <class 'numpy.ndarray'>
{split}_forces <class 'numpy.ndarray'>
{split}_chunk_idx <class 'numpy.ndarray'>

Where,

{split}_ids corresponds to the unique system identifiers
{split}_energy is the predicted energy for each system
{split}_forces is the predicted forces, concatenated across all systems
{split}_chunk_idx is the cumulative atom count used to split the concatenated forces back into per-system arrays

As an example:

import numpy as np
from fairchem.core.datasets import AseDBDataset
from fairchem.core import pretrained_mlip, FAIRChemCalculator

### Define your MLIP calculator
predictor = pretrained_mlip.get_predict_unit(args.checkpoint, device="cuda")
calc = FAIRChemCalculator(predictor, task_name="oc20")

results = {}
for split in ["id", "ood_ads", "ood_cat", "ood_both"]:
    dataset = AseDBDataset({"src": f"path/to/oc20/s2ef/test/{split}"})

    ids = []
    energy = []
    forces = []
    natoms = []
    for idx in range(len(dataset)):
        atoms = dataset.get_atoms(idx)
        atoms.calc = calc
        ids.append(atoms.info["sid"])
        natoms.append(len(atoms))
        energy.append(atoms.get_potential_energy())
        forces.append(atoms.get_forces())

    forces = np.concatenate(forces)
    chunk_idx = np.cumsum(natoms)[:-1]

    results[f"{split}_ids"] = np.array(ids)
    results[f"{split}_energy"] = np.array(energy)
    results[f"{split}_forces"] = forces
    results[f"{split}_chunk_idx"] = chunk_idx

np.savez_compressed("oc20_s2ef_predictions.npz", **results)

IS2RE¶

Predictions must be saved as “.npz” files containing the following keys for each split (id, ood_ads, ood_cat, ood_both):

{split}_ids <class 'numpy.ndarray'>
{split}_energy <class 'numpy.ndarray'>

Where,

{split}_ids corresponds to the unique system identifiers
{split}_energy is the predicted relaxed energy for each system

Once a prediction file is generated, proceed to the leaderboard, fill in the submission form, upload your file, select the corresponding evaluation type (“OC20 S2EF Test” or “OC20 IS2RE Test”) and hit submit. Stay on the page until you see the success message.