core.common.distutils#
Copyright (c) Meta Platforms, Inc. and affiliates.
This source code is licensed under the MIT license found in the LICENSE file in the root directory of this source tree.
Attributes#
Functions#
|
|
|
Get the initialization method for a distributed job based on the specified method type. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Gather a list of pickleable objects into rank 0 |
|
|
|
|
Module Contents#
- core.common.distutils.T#
- core.common.distutils.DISTRIBUTED_PORT = 13356#
- core.common.distutils.CURRENT_DEVICE_TYPE_STR = 'CURRRENT_DEVICE_TYPE'#
- core.common.distutils.os_environ_get_or_throw(x: str) str #
- core.common.distutils.get_init_method(init_method, world_size: int | None, rank: int | None = None, node_list: str | None = None, filename: str | None = None)#
Get the initialization method for a distributed job based on the specified method type.
- Parameters:
init_method – The initialization method type, either “tcp” or “file”.
world_size – The total number of processes in the distributed job.
rank – The rank of the current process (optional).
node_list – The list of nodes for SLURM-based distributed job (optional, used with “tcp”).
filename – The shared file path for file-based initialization (optional, used with “file”).
- Returns:
The initialization method string to be used by PyTorch’s distributed module.
- Raises:
ValueError – If an invalid init_method is provided.
- core.common.distutils.setup(config) None #
- core.common.distutils.cleanup() None #
- core.common.distutils.initialized() bool #
- core.common.distutils.get_rank() int #
- core.common.distutils.get_world_size() int #
- core.common.distutils.is_master() bool #
- core.common.distutils.synchronize() None #
- core.common.distutils.broadcast(tensor: torch.Tensor, src, group=dist.group.WORLD, async_op: bool = False) None #
- core.common.distutils.broadcast_object_list(object_list: list[Any], src: int, group=dist.group.WORLD, device: str | None = None) None #
- core.common.distutils.all_reduce(data, group=dist.group.WORLD, average: bool = False, device=None) torch.Tensor #
- core.common.distutils.all_gather(data, group=dist.group.WORLD, device=None) list[torch.Tensor] #
- core.common.distutils.gather_objects(data: T, group: torch.distributed.ProcessGroup = dist.group.WORLD) list[T] #
Gather a list of pickleable objects into rank 0
- core.common.distutils.assign_device_for_local_rank(cpu: bool, local_rank: int) None #
- core.common.distutils.get_device_for_local_rank() str #
- core.common.distutils.setup_env_local()#