core.models.equiformer_v2.equiformer_v2

Contents

core.models.equiformer_v2.equiformer_v2#

Copyright (c) Meta, Inc. and its affiliates.

This source code is licensed under the MIT license found in the LICENSE file in the root directory of this source tree.

Attributes#

Classes#

EquiformerV2ForceHead

Base class for all neural network modules.

EquiformerV2EnergyHead

Base class for all neural network modules.

EquiformerV2Backbone

Equiformer with graph attention built upon SO(2) convolution and feedforward network built upon S2 activation

Module Contents#

core.models.equiformer_v2.equiformer_v2._AVG_NUM_NODES = 77.81317#
core.models.equiformer_v2.equiformer_v2._AVG_DEGREE = 23.395238876342773#
class core.models.equiformer_v2.equiformer_v2.EquiformerV2ForceHead(backbone)#

Bases: fairchem.core.models.equiformer_v2.heads.EqV2VectorHead

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call to(), etc.

Note

As per the example above, an __init__() call to the parent class must be made before assignment on the child.

Variables:

training (bool) – Boolean represents whether this module is in training or evaluation mode.

class core.models.equiformer_v2.equiformer_v2.EquiformerV2EnergyHead(backbone, reduce: str = 'sum')#

Bases: fairchem.core.models.equiformer_v2.heads.EqV2ScalarHead

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call to(), etc.

Note

As per the example above, an __init__() call to the parent class must be made before assignment on the child.

Variables:

training (bool) – Boolean represents whether this module is in training or evaluation mode.

class core.models.equiformer_v2.equiformer_v2.EquiformerV2Backbone(use_pbc: bool = True, use_pbc_single: bool = False, regress_forces: bool = True, otf_graph: bool = True, max_neighbors: int = 500, max_radius: float = 5.0, max_num_elements: int = 90, num_layers: int = 12, sphere_channels: int = 128, attn_hidden_channels: int = 128, num_heads: int = 8, attn_alpha_channels: int = 32, attn_value_channels: int = 16, ffn_hidden_channels: int = 512, norm_type: str = 'rms_norm_sh', lmax_list: list[int] | None = None, mmax_list: list[int] | None = None, grid_resolution: int | None = None, num_sphere_samples: int = 128, edge_channels: int = 128, use_atom_edge_embedding: bool = True, share_atom_edge_embedding: bool = False, use_m_share_rad: bool = False, distance_function: str = 'gaussian', num_distance_basis: int = 512, attn_activation: str = 'scaled_silu', use_s2_act_attn: bool = False, use_attn_renorm: bool = True, ffn_activation: str = 'scaled_silu', use_gate_act: bool = False, use_grid_mlp: bool = False, use_sep_s2_act: bool = True, alpha_drop: float = 0.1, drop_path_rate: float = 0.05, proj_drop: float = 0.0, weight_init: str = 'normal', enforce_max_neighbors_strictly: bool = True, avg_num_nodes: float | None = None, avg_degree: float | None = None, use_energy_lin_ref: bool | None = False, load_energy_lin_ref: bool | None = False, activation_checkpoint: bool | None = False)#

Bases: torch.nn.Module, fairchem.core.models.base.GraphModelMixin

Equiformer with graph attention built upon SO(2) convolution and feedforward network built upon S2 activation

Parameters:
  • use_pbc (bool) – Use periodic boundary conditions

  • use_pbc_single (bool) – Process batch PBC graphs one at a time

  • regress_forces (bool) – Compute forces

  • otf_graph (bool) – Compute graph On The Fly (OTF)

  • max_neighbors (int) – Maximum number of neighbors per atom

  • max_radius (float) – Maximum distance between nieghboring atoms in Angstroms

  • max_num_elements (int) – Maximum atomic number

  • num_layers (int) – Number of layers in the GNN

  • sphere_channels (int) – Number of spherical channels (one set per resolution)

  • attn_hidden_channels (int) – Number of hidden channels used during SO(2) graph attention

  • num_heads (int) – Number of attention heads

  • attn_alpha_head (int) – Number of channels for alpha vector in each attention head

  • attn_value_head (int) – Number of channels for value vector in each attention head

  • ffn_hidden_channels (int) – Number of hidden channels used during feedforward network

  • norm_type (str) – Type of normalization layer ([‘layer_norm’, ‘layer_norm_sh’, ‘rms_norm_sh’])

  • lmax_list (int) – List of maximum degree of the spherical harmonics (1 to 10)

  • mmax_list (int) – List of maximum order of the spherical harmonics (0 to lmax)

  • grid_resolution (int) – Resolution of SO3_Grid

  • num_sphere_samples (int) – Number of samples used to approximate the integration of the sphere in the output blocks

  • edge_channels (int) – Number of channels for the edge invariant features

  • use_atom_edge_embedding (bool) – Whether to use atomic embedding along with relative distance for edge scalar features

  • share_atom_edge_embedding (bool) – Whether to share atom_edge_embedding across all blocks

  • use_m_share_rad (bool) – Whether all m components within a type-L vector of one channel share radial function weights

  • distance_function ("gaussian", "sigmoid", "linearsigmoid", "silu") – Basis function used for distances

  • attn_activation (str) – Type of activation function for SO(2) graph attention

  • use_s2_act_attn (bool) – Whether to use attention after S2 activation. Otherwise, use the same attention as Equiformer

  • use_attn_renorm (bool) – Whether to re-normalize attention weights

  • ffn_activation (str) – Type of activation function for feedforward network

  • use_gate_act (bool) – If True, use gate activation. Otherwise, use S2 activation

  • use_grid_mlp (bool) – If True, use projecting to grids and performing MLPs for FFNs.

  • use_sep_s2_act (bool) – If True, use separable S2 activation when use_gate_act is False.

  • alpha_drop (float) – Dropout rate for attention weights

  • drop_path_rate (float) – Drop path rate

  • proj_drop (float) – Dropout rate for outputs of attention and FFN in Transformer blocks

  • weight_init (str) – [‘normal’, ‘uniform’] initialization of weights of linear layers except those in radial functions

  • enforce_max_neighbors_strictly (bool) – When edges are subselected based on the max_neighbors arg, arbitrarily select amongst equidistant / degenerate edges to have exactly the correct number.

  • avg_num_nodes (float) – Average number of nodes per graph

  • avg_degree (float) – Average degree of nodes in the graph

  • use_energy_lin_ref (bool) – Whether to add the per-atom energy references during prediction. During training and validation, this should be kept False since we use the lin_ref parameter in the OC22 dataloader to subtract the per-atom linear references from the energy targets. During prediction (where we don’t have energy targets), this can be set to True to add the per-atom linear references to the predicted energies.

  • load_energy_lin_ref (bool) – Whether to add nn.Parameters for the per-element energy references. This additional flag is there to ensure compatibility when strict-loading checkpoints, since the use_energy_lin_ref flag can be either True or False even if the model is trained with linear references. You can’t have use_energy_lin_ref = True and load_energy_lin_ref = False, since the model will not have the parameters for the linear references. All other combinations are fine.

activation_checkpoint#
use_pbc#
use_pbc_single#
regress_forces#
otf_graph#
max_neighbors#
max_radius#
cutoff#
max_num_elements#
num_layers#
sphere_channels#
attn_hidden_channels#
num_heads#
attn_alpha_channels#
attn_value_channels#
ffn_hidden_channels#
norm_type#
lmax_list#
mmax_list#
grid_resolution#
num_sphere_samples#
edge_channels#
use_atom_edge_embedding#
share_atom_edge_embedding#
use_m_share_rad#
distance_function#
num_distance_basis#
attn_activation#
use_s2_act_attn#
use_attn_renorm#
ffn_activation#
use_gate_act#
use_grid_mlp#
use_sep_s2_act#
alpha_drop#
drop_path_rate#
proj_drop#
avg_num_nodes#
avg_degree#
use_energy_lin_ref#
load_energy_lin_ref#
weight_init#
enforce_max_neighbors_strictly#
device = 'cpu'#
grad_forces = False#
num_resolutions: int#
sphere_channels_all: int#
sphere_embedding#
edge_channels_list#
SO3_rotation#
mappingReduced#
SO3_grid#
edge_degree_embedding#
blocks#
norm#
forward(data: torch_geometric.data.batch.Batch) dict[str, torch.Tensor]#
_init_gp_partitions(atomic_numbers_full, data_batch_full, edge_index, edge_distance, edge_distance_vec)#

Graph Parallel This creates the required partial tensors for each rank given the full tensors. The tensors are split on the dimension along the node index using node_partition.

_init_edge_rot_mat(data, edge_index, edge_distance_vec)#
property num_params#
no_weight_decay() set#

Returns a list of parameters with no weight decay.