core.components.benchmark.materials_discovery_reducer#
Copyright (c) Meta Platforms, Inc. and affiliates.
This source code is licensed under the MIT license found in the LICENSE file in the root directory of this source tree.
Attributes#
Classes#
A common pandas DataFrame reducer for benchmarks |
Functions#
|
Pass this to json.dump(default=) or as pandas.to_json(default_handler=) to |
Module Contents#
- core.components.benchmark.materials_discovery_reducer.mbd_installed = True#
- core.components.benchmark.materials_discovery_reducer.MP2020Compatibility#
- core.components.benchmark.materials_discovery_reducer.as_dict_handler(obj: Any) dict[str, Any] | None #
Pass this to json.dump(default=) or as pandas.to_json(default_handler=) to serialize Python classes with as_dict(). Warning: Objects without a as_dict() method are replaced with None in the serialized data.
From matbench_discovery: janosh/matbench-discovery
- class core.components.benchmark.materials_discovery_reducer.MaterialsDiscoveryReducer(benchmark_name: str, target_data_path: str, cse_data_path: str | None = None, elemental_references_path: str | None = None, index_name: str | None = None, corrections: pymatgen.entries.compatibility.Compatibility | None = MP2020Compatibility, max_error_threshold: float = 5.0, analyze_geo_opt: bool = True, geo_symprec: float = 1e-05)#
Bases:
fairchem.core.components.benchmark.JsonDFReducer
A common pandas DataFrame reducer for benchmarks
Results are assumed to be saved as json files that can be read into pandas dataframes. Only mean absolute error is computed for common columns in the predicted results and target data
- _corrections#
- _max_error_threshold#
- _elemental_references_path#
- _cse_data_path#
- _analyze_geo_opt#
- _geo_symprec#
- property runner_type: type[fairchem.core.components.calculate.RelaxationRunner]#
The runner type this reducer is associated with.
- static load_targets(path: str, index_name: str | None) pandas.DataFrame #
Load target data from a JSON file into a pandas DataFrame.
- Parameters:
path – Path to the target JSON file
index_name – Optional name of the column to use as index
- Returns:
DataFrame containing the target data, sorted by index
- static _load_elemental_ref_energies(elemental_references_path: str) dict[str, float] #
- static _load_computed_structure_entries(cse_data_path: str, results: pandas.DataFrame) pandas.DataFrame #
Convert prediction results to computed structure entries with updated energies and structures.
- Returns:
DataFrame of computed structure entries indexed by material IDs
- _apply_corrections(computed_structure_entries: list[pymatgen.entries.computed_entries.ComputedStructureEntry]) None #
Apply compatibility corrections to computed structure entries.
- Parameters:
computed_structure_entries – List of ComputedStructureEntry objects to apply corrections to
- Raises:
ValueError – If not all entries were successfully processed after applying corrections
- _analyze_relaxed_geometry(pred_structures: dict[str, pymatgen.core.Structure], target_structures: dict[str, pymatgen.core.Structure]) dict[str, float] #
Analyze geometry of relaxed structures and calculate RMSD wrt to the target structures.
- Parameters:
pred_structures – Dictionary mapping material IDs to predicted Structure objects
target_structures – Dictionary mapping material IDs to target Structure objects
- Returns:
Dictionary containing geometric analysis metrics
- join_results(results_dir: str, glob_pattern: str) pandas.DataFrame #
Join results from multiple relaxation JSON files into a single DataFrame.
Joins results for relaxed energy, applies compatibility corrections, and computes formation energy w.r.t to MP reference structures in MatBench Discovery
- Parameters:
results_dir – Directory containing result files
glob_pattern – Pattern to match result files
- Returns:
Combined DataFrame containing all results
- save_results(results: pandas.DataFrame, results_dir: str) None #
Save joined results to a single file
Saves the results in two formats: 1. CSV file containing only numerical data 2. JSON file containing all data including relaxed structures
- Parameters:
results – DataFrame containing the prediction results
results_dir – Directory path where result files will be saved
- compute_metrics(results: pandas.DataFrame, run_name: str) pandas.DataFrame #
Compute Matbench discovery metrics for relaxed energy and structure predictions.
- Parameters:
results – DataFrame containing prediction results with energy values
run_name – Identifier for the current evaluation run
- Returns:
DataFrame containing computed metrics for different material subsets
- log_metrics(metrics: pandas.DataFrame, run_name: str) None #
Log metrics to the configured logger if available.
- Parameters:
metrics – DataFrame containing the computed metrics
run_name – Name of the current run
- save_state(checkpoint_location: str, is_preemption: bool = False) bool #
Save the current state of the reducer to a checkpoint.
- Parameters:
checkpoint_location – Location to save the checkpoint
is_preemption – Whether the save is due to preemption
- Returns:
Success status of the save operation
- Return type:
bool
- load_state(checkpoint_location: str | None) None #
Load reducer state from a checkpoint.
- Parameters:
checkpoint_location – Location to load the checkpoint from, or None