core.launchers.slurm_launch#

Classes#

SlurmSPMDProgram

Entrypoint for a SPMD program launched via submitit on slurm.

Functions#

_get_slurm_env(→ fairchem.core.launchers.api.SlurmEnv)

map_job_config_to_dist_config(→ dict)

remove_runner_state_from_submission(→ None)

runner_wrapper(config[, run_type])

_set_seeds(→ None)

_set_deterministic_mode(→ None)

slurm_launch(→ None)

local_launch(cfg, log_dir)

Launch locally with torch elastic (for >1 workers) or just single process

Module Contents#

core.launchers.slurm_launch._get_slurm_env() fairchem.core.launchers.api.SlurmEnv#
core.launchers.slurm_launch.map_job_config_to_dist_config(job_cfg: fairchem.core.launchers.api.JobConfig) dict#
core.launchers.slurm_launch.remove_runner_state_from_submission(log_folder: str, job_id: str) None#
core.launchers.slurm_launch.runner_wrapper(config: omegaconf.DictConfig, run_type: fairchem.core.launchers.api.RunType = RunType.RUN)#
core.launchers.slurm_launch._set_seeds(seed: int) None#
core.launchers.slurm_launch._set_deterministic_mode() None#
class core.launchers.slurm_launch.SlurmSPMDProgram#

Bases: submitit.helpers.Checkpointable

Entrypoint for a SPMD program launched via submitit on slurm. This assumes all ranks run the identical copy of this code

config = None#
runner = None#
reducer = None#
__call__(dict_config: omegaconf.DictConfig, run_type: fairchem.core.launchers.api.RunType = RunType.RUN) None#
_init_logger() None#
checkpoint(*args, **kwargs) submitit.helpers.DelayedSubmission#

Resubmits the same callable with the same arguments

core.launchers.slurm_launch.slurm_launch(cfg: omegaconf.DictConfig, log_dir: str) None#
core.launchers.slurm_launch.local_launch(cfg: omegaconf.DictConfig, log_dir: str)#

Launch locally with torch elastic (for >1 workers) or just single process