core.launchers.slurm_launch

core.launchers.slurm_launch#

Entrypoint for a SPMD program launched via submitit on slurm.

`_get_slurm_env`(→ fairchem.core.launchers.api.SlurmEnv)
`map_job_config_to_dist_config`(→ dict)
`remove_runner_state_from_submission`(→ None)
`runner_wrapper`(config[, run_type])
`_set_seeds`(→ None)
`_set_deterministic_mode`(→ None)
`slurm_launch`(→ None)
`local_launch`(cfg, log_dir)	Launch locally with torch elastic (for >1 workers) or just single process

core.launchers.slurm_launch._get_slurm_env() → fairchem.core.launchers.api.SlurmEnv#

core.launchers.slurm_launch.map_job_config_to_dist_config(job_cfg: fairchem.core.launchers.api.JobConfig) → dict#

core.launchers.slurm_launch.remove_runner_state_from_submission(log_folder: str, job_id: str) → None#

core.launchers.slurm_launch.runner_wrapper(config: omegaconf.DictConfig, run_type: fairchem.core.launchers.api.RunType = RunType.RUN)#

class core.launchers.slurm_launch.SlurmSPMDProgram#

Bases: submitit.helpers.Checkpointable

Entrypoint for a SPMD program launched via submitit on slurm. This assumes all ranks run the identical copy of this code

__call__(dict_config: omegaconf.DictConfig, run_type: fairchem.core.launchers.api.RunType = RunType.RUN) → None#

checkpoint(*args, **kwargs) → submitit.helpers.DelayedSubmission#: Resubmits the same callable with the same arguments

core.launchers.slurm_launch.slurm_launch(cfg: omegaconf.DictConfig, log_dir: str) → None#

core.launchers.slurm_launch.local_launch(cfg: omegaconf.DictConfig, log_dir: str)#: Launch locally with torch elastic (for >1 workers) or just single process