core.models.equiformer_v2

Contents

core.models.equiformer_v2#

Submodules#

Classes#

EquiformerV2

THIS CLASS HAS BEEN DEPRECATED! Please use "EquiformerV2BackboneAndHeads"

Package Contents#

class core.models.equiformer_v2.EquiformerV2(use_pbc: bool = True, use_pbc_single: bool = False, regress_forces: bool = True, otf_graph: bool = True, max_neighbors: int = 500, max_radius: float = 5.0, max_num_elements: int = 90, num_layers: int = 12, sphere_channels: int = 128, attn_hidden_channels: int = 128, num_heads: int = 8, attn_alpha_channels: int = 32, attn_value_channels: int = 16, ffn_hidden_channels: int = 512, norm_type: str = 'rms_norm_sh', lmax_list: list[int] | None = None, mmax_list: list[int] | None = None, grid_resolution: int | None = None, num_sphere_samples: int = 128, edge_channels: int = 128, use_atom_edge_embedding: bool = True, share_atom_edge_embedding: bool = False, use_m_share_rad: bool = False, distance_function: str = 'gaussian', num_distance_basis: int = 512, attn_activation: str = 'scaled_silu', use_s2_act_attn: bool = False, use_attn_renorm: bool = True, ffn_activation: str = 'scaled_silu', use_gate_act: bool = False, use_grid_mlp: bool = False, use_sep_s2_act: bool = True, alpha_drop: float = 0.1, drop_path_rate: float = 0.05, proj_drop: float = 0.0, weight_init: str = 'normal', enforce_max_neighbors_strictly: bool = True, avg_num_nodes: float | None = None, avg_degree: float | None = None, use_energy_lin_ref: bool | None = False, load_energy_lin_ref: bool | None = False)#

Bases: torch.nn.Module, fairchem.core.models.base.GraphModelMixin

THIS CLASS HAS BEEN DEPRECATED! Please use “EquiformerV2BackboneAndHeads”

Equiformer with graph attention built upon SO(2) convolution and feedforward network built upon S2 activation

Parameters:
  • use_pbc (bool) – Use periodic boundary conditions

  • use_pbc_single (bool) – Process batch PBC graphs one at a time

  • regress_forces (bool) – Compute forces

  • otf_graph (bool) – Compute graph On The Fly (OTF)

  • max_neighbors (int) – Maximum number of neighbors per atom

  • max_radius (float) – Maximum distance between nieghboring atoms in Angstroms

  • max_num_elements (int) – Maximum atomic number

  • num_layers (int) – Number of layers in the GNN

  • sphere_channels (int) – Number of spherical channels (one set per resolution)

  • attn_hidden_channels (int) – Number of hidden channels used during SO(2) graph attention

  • num_heads (int) – Number of attention heads

  • attn_alpha_head (int) – Number of channels for alpha vector in each attention head

  • attn_value_head (int) – Number of channels for value vector in each attention head

  • ffn_hidden_channels (int) – Number of hidden channels used during feedforward network

  • norm_type (str) – Type of normalization layer ([‘layer_norm’, ‘layer_norm_sh’, ‘rms_norm_sh’])

  • lmax_list (int) – List of maximum degree of the spherical harmonics (1 to 10)

  • mmax_list (int) – List of maximum order of the spherical harmonics (0 to lmax)

  • grid_resolution (int) – Resolution of SO3_Grid

  • num_sphere_samples (int) – Number of samples used to approximate the integration of the sphere in the output blocks

  • edge_channels (int) – Number of channels for the edge invariant features

  • use_atom_edge_embedding (bool) – Whether to use atomic embedding along with relative distance for edge scalar features

  • share_atom_edge_embedding (bool) – Whether to share atom_edge_embedding across all blocks

  • use_m_share_rad (bool) – Whether all m components within a type-L vector of one channel share radial function weights

  • distance_function ("gaussian", "sigmoid", "linearsigmoid", "silu") – Basis function used for distances

  • attn_activation (str) – Type of activation function for SO(2) graph attention

  • use_s2_act_attn (bool) – Whether to use attention after S2 activation. Otherwise, use the same attention as Equiformer

  • use_attn_renorm (bool) – Whether to re-normalize attention weights

  • ffn_activation (str) – Type of activation function for feedforward network

  • use_gate_act (bool) – If True, use gate activation. Otherwise, use S2 activation

  • use_grid_mlp (bool) – If True, use projecting to grids and performing MLPs for FFNs.

  • use_sep_s2_act (bool) – If True, use separable S2 activation when use_gate_act is False.

  • alpha_drop (float) – Dropout rate for attention weights

  • drop_path_rate (float) – Drop path rate

  • proj_drop (float) – Dropout rate for outputs of attention and FFN in Transformer blocks

  • weight_init (str) – [‘normal’, ‘uniform’] initialization of weights of linear layers except those in radial functions

  • enforce_max_neighbors_strictly (bool) – When edges are subselected based on the max_neighbors arg, arbitrarily select amongst equidistant / degenerate edges to have exactly the correct number.

  • avg_num_nodes (float) – Average number of nodes per graph

  • avg_degree (float) – Average degree of nodes in the graph

  • use_energy_lin_ref (bool) – Whether to add the per-atom energy references during prediction. During training and validation, this should be kept False since we use the lin_ref parameter in the OC22 dataloader to subtract the per-atom linear references from the energy targets. During prediction (where we don’t have energy targets), this can be set to True to add the per-atom linear references to the predicted energies.

  • load_energy_lin_ref (bool) – Whether to add nn.Parameters for the per-element energy references. This additional flag is there to ensure compatibility when strict-loading checkpoints, since the use_energy_lin_ref flag can be either True or False even if the model is trained with linear references. You can’t have use_energy_lin_ref = True and load_energy_lin_ref = False, since the model will not have the parameters for the linear references. All other combinations are fine.

use_pbc#
use_pbc_single#
regress_forces#
otf_graph#
max_neighbors#
max_radius#
cutoff#
max_num_elements#
num_layers#
sphere_channels#
attn_hidden_channels#
num_heads#
attn_alpha_channels#
attn_value_channels#
ffn_hidden_channels#
norm_type#
lmax_list#
mmax_list#
grid_resolution#
num_sphere_samples#
edge_channels#
use_atom_edge_embedding#
share_atom_edge_embedding#
use_m_share_rad#
distance_function#
num_distance_basis#
attn_activation#
use_s2_act_attn#
use_attn_renorm#
ffn_activation#
use_gate_act#
use_grid_mlp#
use_sep_s2_act#
alpha_drop#
drop_path_rate#
proj_drop#
avg_num_nodes#
avg_degree#
use_energy_lin_ref#
load_energy_lin_ref#
weight_init#
enforce_max_neighbors_strictly#
device = 'cpu'#
grad_forces = False#
num_resolutions: int#
sphere_channels_all: int#
sphere_embedding#
edge_channels_list#
SO3_rotation#
mappingReduced#
SO3_grid#
edge_degree_embedding#
blocks#
norm#
energy_block#
_init_gp_partitions(atomic_numbers_full, data_batch_full, edge_index, edge_distance, edge_distance_vec)#

Graph Parallel This creates the required partial tensors for each rank given the full tensors. The tensors are split on the dimension along the node index using node_partition.

forward(data)#
_init_edge_rot_mat(data, edge_index, edge_distance_vec)#
property num_params#
_init_weights(m)#
_uniform_init_rad_func_linear_weights(m)#
_uniform_init_linear_weights(m)#
no_weight_decay() set#

Returns a list of parameters with no weight decay.