Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

OMat24

Dataset Overview
PropertyValue
Size1.07M structures (train) + 1.02M (val)
DomainInorganic bulk materials
LabelsTotal energy (eV), forces (eV/A), stress (eV/A^3)
Level of TheoryDFT (PBE/PBE+U)
LicenseCC BY 4.0
BenchmarkMatbench-Discovery compatible

The Open Materials 2024 (OMat24) dataset contains a mix of single point calculations of non-equilibrium structures and structural relaxations. The dataset contains structures labeled with total energy (eV), forces (eV/A) and stress (eV/A^3). The dataset is provided in ASE DB compatible lmdb files.

The OMat24 train and val splits are fully compatible with the Matbench-Discovery benchmark test set.

  1. The splits do not contain any structure that has a protostructure label present in the initial or relaxed structures of the WBM dataset.

  2. The splits do not include any structure that was generated starting from an Alexandria relaxed structure with protostructure lable in the intitial or relaxed structures of the WBM datset.

Subdatasets

OMat24 is made up of X subdatasets based on how the structures were generated. The subdatasets included are:

  1. rattled-1000-subsampled & rattled-1000

  2. rattled-500-subsampled & rattled-300

  3. rattled-300-subsampled & rattled-500

  4. aimd-from-PBE-1000-npt

  5. aimd-from-PBE-1000-nvt

  6. aimd-from-PBE-3000-npt

  7. aimd-from-PBE-3000-nvt

  8. rattled-relax

Note There are two subdatasets for the rattled-< T > datasets. Both subdatasets in each pair were generated with the same procedure as described in our manuscript.

File contents and downloads

OMat24 train split

Sub-datasetNo. structuresFile sizeDownload
rattled-1000122,93721 GBrattled-1000.tar.gz
rattled-1000-subsampled41,7867.1 GBrattled-1000-subsampled.tar.gz
rattled-50075,16713 GBrattled-500.tar.gz
rattled-500-subsampled43,0687.3 GBrattled-500-subsampled.tar.gz
rattled-30068,59312 GBrattled-300.tar.gz
rattled-300-subsampled37,3936.4 GBrattled-300-subsampled.tar.gz
aimd-from-PBE-1000-npt223,57426 GBaimd-from-PBE-1000-npt.tar.gz
aimd-from-PBE-1000-nvt215,58924 GBaimd-from-PBE-1000-nvt.tar.gz
aimd-from-PBE-3000-npt65,24425 GBaimd-from-PBE-3000-npt.tar.gz
aimd-from-PBE-3000-nvt84,06332 GBaimd-from-PBE-3000-nvt.tar.gz
rattled-relax99,96812 GBrattled-relax.tar.gz
Total1,077,382185.8 GB

OMat24 val split (this is a 1M subset used to train eqV2 models from the 5M val split)

NOTE: The original validation sets contained a duplicated structures. Corrected validation sets were uploaded on 20/12/24. Please see this issue for more details, an re-download the correct version of the validation sets if needed.

Sub-datasetSizeFile SizeDownload
rattled-1000117,004218 MBrattled-1000.tar.gz
rattled-1000-subsampled39,78577 MBrattled-1000-subsampled.tar.gz
rattled-50071,522135 MBrattled-500.tar.gz
rattled-500-subsampled41,02179 MBrattled-500-subsampled.tar.gz
rattled-30065,235122 MBrattled-300.tar.gz
rattled-300-subsampled35,57969 MBrattled-300-subsampled.tar.gz
aimd-from-PBE-1000-npt212,737261 MBaimd-from-PBE-1000-npt.tar.gz
aimd-from-PBE-1000-nvt205,165251 MBaimd-from-PBE-1000-nvt.tar.gz
aimd-from-PBE-3000-npt62,130282 MBaimd-from-PBE-3000-npt.tar.gz
aimd-from-PBE-3000-nvt79,977364 MBaimd-from-PBE-3000-nvt.tar.gz
rattled-relax95,206118 MBrattled-relax.tar.gz
Total1,025,3611.98 GB

sAlex Dataset

We also provide the sAlex dataset used for fine-tuning of our OMat models. sAlex is a subsampled, Matbench-Discovery compliant, version of the original Alexandria. sAlex was created by removing structures matched in WBM and only sampling structure along a trajectory with an energy difference greater than 10 meV/atom. For full details, please see the manuscript.

DatasetSplitNo. StructuresFile SizeDownload
sAlextrain10,447,7657.6 GBtrain.tar.gz
sAlexval553,218408 MBval.tar.gz

Getting ASE atoms objects

Dataset files are written as AseLMDBDatabase objects which are an implementation of an ASE Database, in LMDB format. A single *.aselmdb file can be read and queried like any other ASE DB.

You can also read many DB files at once and access atoms objects using the AseDBDataset class.

For example to read the rattled-relax subdataset,

from fairchem.core.datasets import AseDBDataset

dataset_path = "/path/to/omat24/train/rattled-relax"
config_kwargs = {}  # see tutorial on additiona configuration

dataset = AseDBDataset(config=dict(src=dataset_path, **config_kwargs))

# atoms objects can be retrieved by index
atoms = dataset.get_atoms(0)

To read more than one subdataset you can simply pass a list of subdataset paths,

from fairchem.core.datasets import AseDBDataset

config_kwargs = {}  # see tutorial on additiona configuration
dataset_paths = [
    "/path/to/omat24/train/rattled-relax",
    "/path/to/omat24/train/rattled-1000-subsampled",
    "/path/to/omat24/train/rattled-1000",
]
dataset = AseDBDataset(config=dict(src=dataset_paths, **config_kwargs))

To read all of the OMat24 training or validations splits simply pass the paths to all subdatasets.

Citing OMat24

The OMat24 dataset is licensed under a Creative Commons Attribution 4.0 License.

Please consider citing the following paper in any publications that uses this dataset:

@article{barroso_omat24,
  title={Open Materials 2024 (OMat24) Inorganic Materials Dataset and Models},
  author={Barroso-Luque, Luis and Muhammed, Shuaibi and Fu, Xiang and Wood, Brandon, Dzamba, Misko, and Gao, Meng and Rizvi, Ammar and  Zitnick, C. Lawrence and Ulissi, Zachary W.},
  journal={arXiv preprint arXiv:2410.12771},
  year={2024}
}
@article{schmidt_2023_machine,
  title={Machine-Learning-Assisted Determination of the Global Zero-Temperature Phase Diagram of Materials},
  author={Schmidt, Jonathan and Hoffmann, Noah and Wang, Hai-Chen and Borlido, Pedro and Carri{\c{c}}o, Pedro JMA and Cerqueira, Tiago FT and Botti, Silvana and Marques, Miguel AL},
  journal={Advanced Materials},
  volume={35},
  number={22},
  pages={2210788},
  year={2023},
  url={https://onlinelibrary.wiley.com/doi/full/10.1002/adma.202210788},
  publisher={Wiley Online Library}
}