core._cli

Contents

core._cli#

Copyright (c) Meta Platforms, Inc. and affiliates.

This source code is licensed under the MIT license found in the LICENSE file in the root directory of this source tree.

Attributes#

Classes#

SchedulerType

Enum where members are also (and must be) strings

DeviceType

Enum where members are also (and must be) strings

RunType

Enum where members are also (and must be) strings

DistributedInitMethod

Enum where members are also (and must be) strings

SlurmConfig

SchedulerConfig

SlurmEnv

Metadata

JobConfig

Submitit

Derived callable classes are requeued after timeout with their current

Functions#

_set_seeds(→ None)

_set_deterministic_mode(→ None)

_get_slurm_env(→ SlurmEnv)

remove_runner_state_from_submission(→ None)

map_job_config_to_dist_config(→ dict)

get_canonical_config(→ omegaconf.DictConfig)

get_hydra_config_from_yaml(→ omegaconf.DictConfig)

_runner_wrapper(config[, run_type])

main([args, override_args])

Module Contents#

core._cli.ALLOWED_TOP_LEVEL_KEYS#
core._cli.LOG_DIR_NAME = 'logs'#
core._cli.CHECKPOINT_DIR_NAME = 'checkpoints'#
core._cli.RESULTS_DIR = 'results'#
core._cli.CONFIG_FILE_NAME = 'canonical_config.yaml'#
core._cli.PREEMPTION_STATE_DIR_NAME = 'preemption_state'#
class core._cli.SchedulerType#

Bases: fairchem.core.common.utils.StrEnum

Enum where members are also (and must be) strings

LOCAL = 'local'#
SLURM = 'slurm'#
class core._cli.DeviceType#

Bases: fairchem.core.common.utils.StrEnum

Enum where members are also (and must be) strings

CPU = 'cpu'#
CUDA = 'cuda'#
class core._cli.RunType#

Bases: fairchem.core.common.utils.StrEnum

Enum where members are also (and must be) strings

RUN = 'run'#
REDUCE = 'reduce'#
class core._cli.DistributedInitMethod#

Bases: fairchem.core.common.utils.StrEnum

Enum where members are also (and must be) strings

TCP = 'tcp'#
FILE = 'file'#
class core._cli.SlurmConfig#
mem_gb: int = 80#
timeout_hr: int = 168#
cpus_per_task: int = 8#
partition: str | None = None#
qos: str | None = None#
account: str | None = None#
class core._cli.SchedulerConfig#
mode: SchedulerType#
distributed_init_method: DistributedInitMethod#
ranks_per_node: int = 1#
num_nodes: int = 1#
num_array_jobs: int = 1#
slurm: SlurmConfig#
class core._cli.SlurmEnv#
job_id: str | None = None#
raw_job_id: str | None = None#
array_job_id: str | None = None#
array_task_id: str | None = None#
restart_count: str | None = None#
class core._cli.Metadata#
commit: str#
log_dir: str#
checkpoint_dir: str#
results_dir: str#
config_path: str#
preemption_checkpoint_dir: str#
cluster_name: str#
array_job_num: int = 0#
slurm_env: SlurmEnv#
class core._cli.JobConfig#
run_name: str#
timestamp_id: str#
run_dir: str#
device_type: DeviceType#
debug: bool = False#
scheduler: SchedulerConfig#
logger: dict | None = None#
seed: int = 0#
deterministic: bool = False#
runner_state_path: str | None = None#
metadata: Metadata | None = None#
graph_parallel_group_size: int | None = None#
__post_init__() None#
core._cli._set_seeds(seed: int) None#
core._cli._set_deterministic_mode() None#
core._cli._get_slurm_env() SlurmEnv#
core._cli.remove_runner_state_from_submission(log_folder: str, job_id: str) None#
class core._cli.Submitit#

Bases: submitit.helpers.Checkpointable

Derived callable classes are requeued after timeout with their current state dumped at checkpoint.

__call__ method must be implemented to make your class a callable.

Note

The following implementation of the checkpoint method resubmits the full current state of the callable (self) with the initial argument. You may want to replace the method to curate the state (dump a neural network to a standard format and remove it from the state so that not to pickle it) and change/remove the initial parameters.

config = None#
runner = None#
reducer = None#
__call__(dict_config: omegaconf.DictConfig, run_type: RunType = RunType.RUN) None#
_init_logger() None#
checkpoint(*args, **kwargs) submitit.helpers.DelayedSubmission#

Resubmits the same callable with the same arguments

core._cli.map_job_config_to_dist_config(job_cfg: JobConfig) dict#
core._cli.get_canonical_config(config: omegaconf.DictConfig) omegaconf.DictConfig#
core._cli.get_hydra_config_from_yaml(config_yml: str, overrides_args: list[str]) omegaconf.DictConfig#
core._cli._runner_wrapper(config: omegaconf.DictConfig, run_type: RunType = RunType.RUN)#
core._cli.main(args: argparse.Namespace | None = None, override_args: list[str] | None = None)#