core.common.test_utils#

Classes#

ForkedPdb

A Pdb subclass that may be used from a forked multiprocessing child

PGConfig

Functions#

init_env_rank_and_launch_test(→ None)

init_pg_and_rank_and_launch_test(→ None)

spawn_multi_process(→ list[Any])

Spawn single node, multi-rank function.

init_local_distributed_process_group([backend])

Module Contents#

class core.common.test_utils.ForkedPdb(completekey='tab', stdin=None, stdout=None, skip=None, nosigint=False, readrc=True)#

Bases: pdb.Pdb

A Pdb subclass that may be used from a forked multiprocessing child https://stackoverflow.com/questions/4716533/how-to-attach-debugger-to-a-python-subproccess/23654936#23654936

example usage to debug a torch distributed run on rank 0: if torch.distributed.get_rank() == 0:

from fairchem.core.common.test_utils import ForkedPdb ForkedPdb().set_trace()

interaction(*args, **kwargs)#
class core.common.test_utils.PGConfig#
backend: str#
world_size: int#
gp_group_size: int = 1#
port: str = '12345'#
use_gp: bool = True#
core.common.test_utils.init_env_rank_and_launch_test(rank: int, pg_setup_params: PGConfig, mp_output_dict: dict[int, object], test_method: callable, args: list[object], kwargs: dict[str, object]) None#
core.common.test_utils.init_pg_and_rank_and_launch_test(rank: int, pg_setup_params: PGConfig, mp_output_dict: dict[int, object], test_method: callable, args: list[object], kwargs: dict[str, object]) None#
core.common.test_utils.spawn_multi_process(config: PGConfig, test_method: callable, init_and_launch: callable, *test_method_args: Any, **test_method_kwargs: Any) list[Any]#

Spawn single node, multi-rank function. Uses localhost and free port to communicate.

Parameters:
  • world_size – number of processes

  • backend – backend to use. for example, “nccl”, “gloo”, etc

  • test_method – callable to spawn. first 3 arguments are rank, world_size and mp output dict

  • test_method_args – args for the test method

  • test_method_kwargs – kwargs for the test method

Returns:

A list, l, where l[i] is the return value of test_method on rank i

core.common.test_utils.init_local_distributed_process_group(backend='nccl')#