core.components.benchmark.materials_discovery_reducer#

Copyright (c) Meta Platforms, Inc. and affiliates.

This source code is licensed under the MIT license found in the LICENSE file in the root directory of this source tree.

Attributes#

Classes#

MaterialsDiscoveryReducer

A common pandas DataFrame reducer for benchmarks

Functions#

as_dict_handler(→ dict[str, Any] | None)

Pass this to json.dump(default=) or as pandas.to_json(default_handler=) to

Module Contents#

core.components.benchmark.materials_discovery_reducer.mbd_installed = True#
core.components.benchmark.materials_discovery_reducer.MP2020Compatibility#
core.components.benchmark.materials_discovery_reducer.as_dict_handler(obj: Any) dict[str, Any] | None#

Pass this to json.dump(default=) or as pandas.to_json(default_handler=) to serialize Python classes with as_dict(). Warning: Objects without a as_dict() method are replaced with None in the serialized data.

From matbench_discovery: janosh/matbench-discovery

class core.components.benchmark.materials_discovery_reducer.MaterialsDiscoveryReducer(benchmark_name: str, target_data_path: str, cse_data_path: str | None = None, elemental_references_path: str | None = None, index_name: str | None = None, corrections: pymatgen.entries.compatibility.Compatibility | None = MP2020Compatibility, max_error_threshold: float = 5.0, analyze_geo_opt: bool = True, geo_symprec: float = 1e-05)#

Bases: fairchem.core.components.benchmark.JsonDFReducer

A common pandas DataFrame reducer for benchmarks

Results are assumed to be saved as json files that can be read into pandas dataframes. Only mean absolute error is computed for common columns in the predicted results and target data

_corrections#
_max_error_threshold#
_elemental_references_path#
_cse_data_path#
_analyze_geo_opt#
_geo_symprec#
property runner_type: type[fairchem.core.components.calculate.RelaxationRunner]#

The runner type this reducer is associated with.

static load_targets(path: str, index_name: str | None) pandas.DataFrame#

Load target data from a JSON file into a pandas DataFrame.

Parameters:
  • path – Path to the target JSON file

  • index_name – Optional name of the column to use as index

Returns:

DataFrame containing the target data, sorted by index

static _load_elemental_ref_energies(elemental_references_path: str) dict[str, float]#
static _load_computed_structure_entries(cse_data_path: str, results: pandas.DataFrame) pandas.DataFrame#

Convert prediction results to computed structure entries with updated energies and structures.

Returns:

DataFrame of computed structure entries indexed by material IDs

_apply_corrections(computed_structure_entries: list[pymatgen.entries.computed_entries.ComputedStructureEntry]) None#

Apply compatibility corrections to computed structure entries.

Parameters:

computed_structure_entries – List of ComputedStructureEntry objects to apply corrections to

Raises:

ValueError – If not all entries were successfully processed after applying corrections

_analyze_relaxed_geometry(pred_structures: dict[str, pymatgen.core.Structure], target_structures: dict[str, pymatgen.core.Structure]) dict[str, float]#

Analyze geometry of relaxed structures and calculate RMSD wrt to the target structures.

Parameters:
  • pred_structures – Dictionary mapping material IDs to predicted Structure objects

  • target_structures – Dictionary mapping material IDs to target Structure objects

Returns:

Dictionary containing geometric analysis metrics

join_results(results_dir: str, glob_pattern: str) pandas.DataFrame#

Join results from multiple relaxation JSON files into a single DataFrame.

Joins results for relaxed energy, applies compatibility corrections, and computes formation energy w.r.t to MP reference structures in MatBench Discovery

Parameters:
  • results_dir – Directory containing result files

  • glob_pattern – Pattern to match result files

Returns:

Combined DataFrame containing all results

save_results(results: pandas.DataFrame, results_dir: str) None#

Save joined results to a single file

Saves the results in two formats: 1. CSV file containing only numerical data 2. JSON file containing all data including relaxed structures

Parameters:
  • results – DataFrame containing the prediction results

  • results_dir – Directory path where result files will be saved

compute_metrics(results: pandas.DataFrame, run_name: str) pandas.DataFrame#

Compute Matbench discovery metrics for relaxed energy and structure predictions.

Parameters:
  • results – DataFrame containing prediction results with energy values

  • run_name – Identifier for the current evaluation run

Returns:

DataFrame containing computed metrics for different material subsets

log_metrics(metrics: pandas.DataFrame, run_name: str) None#

Log metrics to the configured logger if available.

Parameters:
  • metrics – DataFrame containing the computed metrics

  • run_name – Name of the current run

save_state(checkpoint_location: str, is_preemption: bool = False) bool#

Save the current state of the reducer to a checkpoint.

Parameters:
  • checkpoint_location – Location to save the checkpoint

  • is_preemption – Whether the save is due to preemption

Returns:

Success status of the save operation

Return type:

bool

load_state(checkpoint_location: str | None) None#

Load reducer state from a checkpoint.

Parameters:

checkpoint_location – Location to load the checkpoint from, or None