core.launchers.slurm_launch#
Classes#
Entrypoint for a SPMD program launched via submitit on slurm. |
Functions#
|
|
|
|
|
|
|
|
|
|
|
|
|
Launch locally with torch elastic (for >1 workers) or just single process |
Module Contents#
- core.launchers.slurm_launch._get_slurm_env() fairchem.core.launchers.api.SlurmEnv#
- core.launchers.slurm_launch.map_job_config_to_dist_config(job_cfg: fairchem.core.launchers.api.JobConfig) dict#
- core.launchers.slurm_launch.remove_runner_state_from_submission(log_folder: str, job_id: str) None#
- core.launchers.slurm_launch.runner_wrapper(config: omegaconf.DictConfig, run_type: fairchem.core.launchers.api.RunType = RunType.RUN)#
- core.launchers.slurm_launch._set_seeds(seed: int) None#
- core.launchers.slurm_launch._set_deterministic_mode() None#
- class core.launchers.slurm_launch.SlurmSPMDProgram#
Bases:
submitit.helpers.CheckpointableEntrypoint for a SPMD program launched via submitit on slurm. This assumes all ranks run the identical copy of this code
- config = None#
- runner = None#
- reducer = None#
- __call__(dict_config: omegaconf.DictConfig, run_type: fairchem.core.launchers.api.RunType = RunType.RUN) None#
- _init_logger() None#
- checkpoint(*args, **kwargs) submitit.helpers.DelayedSubmission#
Resubmits the same callable with the same arguments
- core.launchers.slurm_launch.slurm_launch(cfg: omegaconf.DictConfig, log_dir: str) None#
- core.launchers.slurm_launch.local_launch(cfg: omegaconf.DictConfig, log_dir: str)#
Launch locally with torch elastic (for >1 workers) or just single process