Learning local equivariant representations for quantum operators

Zhanghao Zhouyin AI for Science Institute, Beijing 100080, China Department of Physics, McGill University, Montreal, Quebec, Canada H3A2T8    Zixi Gan AI for Science Institute, Beijing 100080, China Department of Chemistry, Zhejiang University, Hangzhou 310027, China    Shishir Kumar Pandey Birla Institute of Technology & Science, Pilani-Dubai Campus, Dubai 345055, UAE    Linfeng Zhang AI for Science Institute, Beijing 100080, China DP Technology, Beijing 100080, China    Qiangqiang Gu [email protected] [email protected] AI for Science Institute, Beijing 100080, China School of Mathematical Science, Peking University, Beijing 100871, China
(July 8, 2024)
Abstract

Predicting quantum operator matrices such as Hamiltonian, overlap, and density matrices in the density functional theory (DFT) framework is crucial for understanding material properties. Current methods often focus on individual operators and struggle with efficiency and scalability for large systems. Here we introduce a novel deep learning model, SLEM (strictly localized equivariant message-passing) for predicting multiple quantum operators, that achieves state-of-the-art accuracy while dramatically improving computational efficiency. SLEM’s key innovation is its strict locality-based design, constructing local, equivariant representations for quantum tensors while preserving physical symmetries. This enables complex many-body dependence without expanding the effective receptive field, leading to superior data efficiency and transferability. Using an innovative SO(2) convolution technique, SLEM reduces the computational complexity of high-order tensor products and is therefore capable of handling systems requiring the f𝑓fitalic_f and g𝑔gitalic_g orbitals in their basis sets. We demonstrate SLEM’s capabilities across diverse 2D and 3D materials, achieving high accuracy even with limited training data. SLEM’s design facilitates efficient parallelization, potentially extending DFT simulations to systems with device-level sizes, opening new possibilities for large-scale quantum simulations and high-throughput materials discovery.

1.   Introduction

Quantum operators, representing observables and the evolution of quantum systems, are the cornerstone of describing the microscopic world. In modern quantum science, the advent of density functional theory (DFT) [1, 2] has elevated single-particle quantum operators, such as the Kohn-Sham Hamiltonian, density matrix, and overlap matrix, to paramount importance in solving complex problems [3]. These operators play a crucial role in unraveling electronic structures, predicting material properties, and advancing quantum technologies. However, as we tackle larger and more complex systems, the efficient and accurate representation of these fundamental operators has emerged as a pressing challenge in computation, demanding new avenues for innovative methodologies.

Recent advances have incorporated machine learning techniques to accelerate DFT calculations by directly predicting DFT’s output of quantum operators, including charge density [4], overlapping matrix [5, 4], self-energy [6], wave function [4], and Hamiltonian matrix [7, 5, 8, 9, 4, 10].

Refer to caption
Figure 1: Illustration of the local design of SLEM in comparison with conventional MPNN on a 1D graph. (a) MPNN information aggregation. (b) SLEM information aggregation. Balls and sticks represent nodes and edges; arrows show aggregation direction. rcutsubscript𝑟cutr_{\text{cut}}italic_r start_POSTSUBSCRIPT cut end_POSTSUBSCRIPT: pre-defined cut-off; reffsubscript𝑟effr_{\text{eff}}italic_r start_POSTSUBSCRIPT eff end_POSTSUBSCRIPT: effective cut-off after two hidden layer updates. L𝐿Litalic_L is the layer index.

By circumventing self-consistent DFT calculations, such methods have the potential to scale up the electronic structure calculations. Some of the approaches utilize Gaussian regressions [9], kernel-based models [9], and neural networks [11] to predict invariant Hamiltonian matrix blocks on localized frames. Notably, powerful equivariant message-passing neural networks (E-MPNNs) have demonstrated remarkable accuracy [12, 13, 14, 15, 16, 17, 18, 19, 20]. These networks ensure the equivariance of output tensor blocks, respecting the physical priors of atomic systems. Typically, they initialize the node and edge features with atomic species and the weighted projection of the spherical harmonics coefficient, then use iterative two-body updates to build many-body interactions. Despite their high accuracy, these iterative updates invariably enlarge the receptive field of node and edge features, incorporating numerous effective neighbours. This expansion limits parallelization and, consequently, the model’s scalability. The storage-intensive nature of quantum tensor prediction tasks exacerbates these limitations, as high-dimensional features in edges and nodes necessitate substantial GPU memory during training and prediction. Consequently, training such models on large datasets or predicting quantum operators for extensive atomic structures poses considerable challenges.

Fortunately, the electrostatic screening counteracts the long-term dependency in a lot of material systems [21, 22, 23]. For example, for bulk Silicon, Ge, a typical screening length is around 5 a.u. [23]. Therefore, the predicted quantum operators can be decomposed into elements with localized dependency on atomic structures. Naturally, a strictly local model that avoids expanding the receptive field, while maintaining sufficient accuracy is at most preferred. In the field of machine-learning interatomic potentials (MLIPs), this concept has been applied in the Allegro model [13], where the strictly local representation of interatomic potentials achieves high accuracy comparable to E-MPNNs and is highly parallelizable, particularly for large systems. However, unlike modelling interatomic potential, which only concerns scalars (energy) and vectors (forces) (corresponding to angular momentum l=0𝑙0l=0italic_l = 0 and 1111) from node features, the prediction of the general representation of quantum operators necessitates targeting both node and edge features on high-order spherical tensors (even up to l=6𝑙6l=6italic_l = 6 or 8888). This adds complexity to constructing strictly localized models for node and edge updates. Another significant challenge in high-order tensor prediction tasks is computational complexity. To equivariantly mix the features of different angular momentum l𝑙litalic_l, tensor products that scale as 𝒪𝒪\mathcal{O}caligraphic_O(l6superscript𝑙6l^{6}italic_l start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT) are required [19]. Achieving high accuracy often necessitates up to hundreds of thousands of training steps, which significantly limits the applicability of current models to heavy elements. This limitation hinders the development of a unified machine-learning DFT model that generalizes across the periodic table. Therefore, a crucial question, whether one can employ the strictly localized scheme to develop powerful equivariant atom and bond features that can efficiently and accurately reproduce the corresponding quantum tensor block elements with reduced computational complexity, naturally arises.

This work presents a novel method, the strictly localized equivariant message-passing model (SLEM), implemented in the framework of DeePTB package [24] for efficient representations of quantum operators. SLEM employs a fully localized scheme to construct high-order node and edge equivariant features for a general representation of quantum operators, including the Hamiltonian and density matrix. As illustrated in Fig.1, the model embeds localized edge hidden states and utilizes them to construct localized node and edge features without including distant neighbours beyond a prefixed cutoff range rcutsubscript𝑟cutr_{\text{cut}}italic_r start_POSTSUBSCRIPT cut end_POSTSUBSCRIPT. This design enables the SLEM model to generalize more effectively, parallelize more easily, and scale to larger systems. Additionally, a fast and efficient SO(2) convolution [19] is implemented to reduce the 𝒪𝒪\mathcal{O}caligraphic_O(l6superscript𝑙6l^{6}italic_l start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT) complexity, with edge-specific training weights, thereby further enhancing the model’s accuracy. As for the overlap matrix, it is typically required along with other operators for property calculations. Previous works have used another E-MPNN that doubles the size of the networks [10] or extracts it from DFT calculations [8, 5], incurring additional costs or incorporating out-of-loop computation steps that complicate inference. In contrast, we utilized the two-center integrals nature and parameterized the overlap matrix with spherical independent scalars (i.e., Slater-Koster-like parameters [25]). This method, inspired by the early work of DeePTB [24], fits the overlap operator representation alongside other operators with minimal additional cost.

This paper is organized as follows. In section 2, we review the related works, focusing on message-passing neural networks and equivariant message passing. In section 3, we detail the architecture of the SLEM model. Section 4 presents our experimental results, comparing the accuracy and efficiency of our model with existing models in reported data sets, and conducting generalization experiments. Finally, we summarize our findings and discuss future directions.

2.   Related works

Message-passing Neural Networks

Message-passing neural networks (MPNNs) [26] have been widely applied in the modelling of atomic systems due to their exceptional accuracy in capturing the intricate relationships between atomic environments and physical properties [27]. In the message-passing scheme, atoms are treated as nodes in graphs, with bonds to neighbouring atoms represented as connected edges within a specified cutoff radius. The embedded atomic features are processed by trainable functions, generating messages from each edge to update the embeddings of central atoms. Formally, the MPNN framework can be summarized as follows:

𝐞ij,Lsuperscript𝐞𝑖𝑗𝐿\displaystyle\mathbf{e}^{ij,L}bold_e start_POSTSUPERSCRIPT italic_i italic_j , italic_L end_POSTSUPERSCRIPT =𝒩L(𝐧i,L1,𝐧j,L1,𝐞ij,L1)absentsubscript𝒩𝐿superscript𝐧𝑖𝐿1superscript𝐧𝑗𝐿1superscript𝐞𝑖𝑗𝐿1\displaystyle=\mathcal{N}_{L}\left(\mathbf{n}^{i,L-1},\mathbf{n}^{j,L-1},% \mathbf{e}^{ij,L-1}\right)= caligraphic_N start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ( bold_n start_POSTSUPERSCRIPT italic_i , italic_L - 1 end_POSTSUPERSCRIPT , bold_n start_POSTSUPERSCRIPT italic_j , italic_L - 1 end_POSTSUPERSCRIPT , bold_e start_POSTSUPERSCRIPT italic_i italic_j , italic_L - 1 end_POSTSUPERSCRIPT )
𝐦ij,Lsuperscript𝐦𝑖𝑗𝐿\displaystyle\mathbf{m}^{ij,L}bold_m start_POSTSUPERSCRIPT italic_i italic_j , italic_L end_POSTSUPERSCRIPT =ML(𝐧i,L1,𝐧j,L1,𝐞ij,L)absentsubscript𝑀𝐿superscript𝐧𝑖𝐿1superscript𝐧𝑗𝐿1superscript𝐞𝑖𝑗𝐿\displaystyle=M_{L}\left(\mathbf{n}^{i,L-1},\mathbf{n}^{j,L-1},\mathbf{e}^{ij,% L}\right)= italic_M start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ( bold_n start_POSTSUPERSCRIPT italic_i , italic_L - 1 end_POSTSUPERSCRIPT , bold_n start_POSTSUPERSCRIPT italic_j , italic_L - 1 end_POSTSUPERSCRIPT , bold_e start_POSTSUPERSCRIPT italic_i italic_j , italic_L end_POSTSUPERSCRIPT )
𝐧i,Lsuperscript𝐧𝑖𝐿\displaystyle\mathbf{n}^{i,L}bold_n start_POSTSUPERSCRIPT italic_i , italic_L end_POSTSUPERSCRIPT =UL(𝐧i,L1,j𝒩(i)𝐦ij,L)absentsubscript𝑈𝐿superscript𝐧𝑖𝐿1subscript𝑗𝒩𝑖superscript𝐦𝑖𝑗𝐿\displaystyle=U_{L}\left(\mathbf{n}^{i,L-1},\sum_{j\in\mathcal{N}(i)}\mathbf{m% }^{ij,L}\right)= italic_U start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ( bold_n start_POSTSUPERSCRIPT italic_i , italic_L - 1 end_POSTSUPERSCRIPT , ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_N ( italic_i ) end_POSTSUBSCRIPT bold_m start_POSTSUPERSCRIPT italic_i italic_j , italic_L end_POSTSUPERSCRIPT )

Here, 𝐞ij,Lsuperscript𝐞𝑖𝑗𝐿\mathbf{e}^{ij,L}bold_e start_POSTSUPERSCRIPT italic_i italic_j , italic_L end_POSTSUPERSCRIPT represents the edge features, 𝐦ij,Lsuperscript𝐦𝑖𝑗𝐿\mathbf{m}^{ij,L}bold_m start_POSTSUPERSCRIPT italic_i italic_j , italic_L end_POSTSUPERSCRIPT denotes the messages, and 𝐧i,Lsuperscript𝐧𝑖𝐿\mathbf{n}^{i,L}bold_n start_POSTSUPERSCRIPT italic_i , italic_L end_POSTSUPERSCRIPT indicates the node features at layer L𝐿Litalic_L. 𝒩Lsubscript𝒩𝐿\mathcal{N}_{L}caligraphic_N start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT, MLsubscript𝑀𝐿M_{L}italic_M start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT, and ULsubscript𝑈𝐿U_{L}italic_U start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT are the trainable functions for the edge, message, and node updates. 𝒩(i)={j|rij<rcut}𝒩𝑖conditional-set𝑗subscript𝑟𝑖𝑗subscript𝑟cut\mathcal{N}(i)=\{j|r_{ij}<r_{\text{cut}}\}caligraphic_N ( italic_i ) = { italic_j | italic_r start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT < italic_r start_POSTSUBSCRIPT cut end_POSTSUBSCRIPT } indicates all the neighbour atoms for atom i𝑖iitalic_i, with rcutsubscript𝑟cutr_{\text{cut}}italic_r start_POSTSUBSCRIPT cut end_POSTSUBSCRIPT being the predefined cutoff. This updating framework allows for the construction of many-body interactions and long-term dependencies, leading to strong performance across various applications. Previous works have predominantly utilized this updating scheme, achieving remarkably high precision in reported systems [28, 29, 12]. However, as updates progress, the effective cutoff radius (reff=N×rcutsubscript𝑟eff𝑁subscript𝑟cutr_{\text{eff}}=N\times r_{\text{cut}}italic_r start_POSTSUBSCRIPT eff end_POSTSUBSCRIPT = italic_N × italic_r start_POSTSUBSCRIPT cut end_POSTSUBSCRIPT) for each atom’s features grows linearly with the number of update steps (N𝑁Nitalic_N), as shown in Fig.1. Consequently, the effective neighbour list scales cubically with reffsubscript𝑟effr_{\text{eff}}italic_r start_POSTSUBSCRIPT eff end_POSTSUBSCRIPT, making parallelization intractable. This issue is particularly severe when fitting representations of quantum operators, as the rcutsubscript𝑟cutr_{\text{cut}}italic_r start_POSTSUBSCRIPT cut end_POSTSUBSCRIPT defined by the cutoff radius for the linear combination of atomic orbitals (LCAO) basis is often larger than that defined in other tasks. A localized approach to constructing many-body interactions without including distant atoms has been proposed. For example, the Allegro model [13] modifies the updating rules by incorporating a hidden atom-pair state that depends explicitly on the direction vector and the center atom embedding. While this framework works successfully for potential energy prediction, it requires further improvement for fitting more general edge and node features. Other local methods avoid iterative updates that enlarge the receptive field, instead creating manually designed descriptors of the local environment [30, 31, 32, 33, 34, 35]. These methods are generally local but are criticized for insufficiency in describing high-order interactions.

Refer to caption
Figure 2: The design of the SLEM model. (a) the hierarchy structure of the model. Starts with the structure information such as atomic number Zisubscript𝑍𝑖Z_{i}italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, and the radial and spherical part of the shift vector 𝒓ijsubscript𝒓𝑖𝑗\bm{r}_{ij}bold_italic_r start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT and 𝐘c,lijsubscriptsuperscript𝐘𝑖𝑗𝑐𝑙\mathbf{Y}^{ij}_{c,l}bold_Y start_POSTSUPERSCRIPT italic_i italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_l end_POSTSUBSCRIPT, the initialized hidden 𝐱ijsuperscript𝐱𝑖𝑗\mathbf{x}^{ij}bold_x start_POSTSUPERSCRIPT italic_i italic_j end_POSTSUPERSCRIPT, 𝐕ijsuperscript𝐕𝑖𝑗\mathbf{V}^{ij}bold_V start_POSTSUPERSCRIPT italic_i italic_j end_POSTSUPERSCRIPT, edge and node features 𝐞ijsuperscript𝐞𝑖𝑗\mathbf{e}^{ij}bold_e start_POSTSUPERSCRIPT italic_i italic_j end_POSTSUPERSCRIPT, 𝐧isuperscript𝐧𝑖\mathbf{n}^{i}bold_n start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT would be generated. The initialized two-body hidden features would be used to predict the Slater-Koster-like Overlap parameters directly and transformed to construct (off-)diagonal blocks for the overlap operator. Others then, will update iteratively using the design’s strict localized updating scheme. (b)-(d) shows the details of the hidden update (b), edge update (c), and node update (d). After the updates, the node and edge features would be used to construct the (off-)diagonal blocks using the Wigner-Eckart theorem for quantum operators such as density matrix and Hamiltonian.

Equivariant Message Passing

There are certain natures of the physical quantities that should be invariant or equivariant under e.g. the spatial and temporal operations. For example, under translation, rotation, and reflection, the scalars such as the energy of a system are invariant, while the tensors, such as forces and Hamiltonian matrices should transform accordingly. Such three kinds of operations constitute the group operations of E(3), known as the Euclidian Group (O(3) if without translation). To model such equivariant quantities, a set of neural network models has been developed that incorporate operations that preserve this equivariance. [36, 37, 38, 39] These neural networks possess physical priors, as their operations ensure the output transforms in sync with the input, making them more generalizable, accurate, and data-efficient in predicting physical quantities. Formally, an equivariant operation from vector space X to vector space Y is defined such that:

f(DX[g]x)=DY[g]f(x)gG,xXformulae-sequence𝑓subscript𝐷𝑋delimited-[]𝑔𝑥subscript𝐷𝑌delimited-[]𝑔𝑓𝑥formulae-sequencefor-all𝑔𝐺for-all𝑥𝑋\displaystyle f(D_{X}[g]x)=D_{Y}[g]f(x)\quad\forall g\in G,\forall x\in Xitalic_f ( italic_D start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT [ italic_g ] italic_x ) = italic_D start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT [ italic_g ] italic_f ( italic_x ) ∀ italic_g ∈ italic_G , ∀ italic_x ∈ italic_X

where DX[g]GL(X)subscript𝐷𝑋delimited-[]𝑔𝐺𝐿𝑋D_{X}[g]\in GL(X)italic_D start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT [ italic_g ] ∈ italic_G italic_L ( italic_X ) is the representation of group element g𝑔gitalic_g on vector space X𝑋Xitalic_X. Since the representation takes the form of the direct sum of irreducible representations of the group G𝐺Gitalic_G, it suggests that the vectors x𝑥xitalic_x and y𝑦yitalic_y themselves are composed by the irreducible representation (irreps for short) of group G𝐺Gitalic_G. Usually, we consider here for operations of group O(3)(since translational symmetry is naturally satisfied by using relative coordinates), the irreps are the spherical tensors with angular momentum index l𝑙litalic_l and parity p𝑝pitalic_p. So features x,y𝑥𝑦x,yitalic_x , italic_y should also be indexed by l,p𝑙𝑝l,pitalic_l , italic_p. For each irrep with angular momentum l𝑙litalic_l, an internal degree of freedom, denoted by m𝑚mitalic_m exists such that |m|l𝑚𝑙|m|\leq l| italic_m | ≤ italic_l. Therefore, each irreps of order l𝑙litalic_l belongs to a subspace of dimension 2l+12𝑙12l+12 italic_l + 1. Once the l,p𝑙𝑝l,pitalic_l , italic_p index is defined, one can construct the group operations, which are represented by the (2l+1)×(2l+1)2𝑙12𝑙1(2l+1)\times(2l+1)( 2 italic_l + 1 ) × ( 2 italic_l + 1 ) Wigner-D matrices.

Performing numerical operations on such irreps features follows certain rules. For the tensors with the same l𝑙litalic_l, addition and subtraction can be conducted. To mix up the tensors of different l𝑙litalic_l indices, an important operation called tensor product (tensor-product\otimes) is applied, which is considered the generalized operation of multiplication on spherical tensors. More formally:

(𝐱𝐲)m3l3=m1,m2C(l1,m1)(l2,m2)(l3,m3)𝐱m1l1𝐲m2l2subscriptsuperscripttensor-product𝐱𝐲subscript𝑙3subscript𝑚3subscriptsubscript𝑚1subscript𝑚2superscriptsubscript𝐶subscript𝑙1subscript𝑚1subscript𝑙2subscript𝑚2subscript𝑙3subscript𝑚3subscriptsuperscript𝐱subscript𝑙1subscript𝑚1subscriptsuperscript𝐲subscript𝑙2subscript𝑚2\displaystyle(\mathbf{x}\otimes\mathbf{y})^{l_{3}}_{m_{3}}=\sum_{m_{1},m_{2}}C% _{(l_{1},m_{1})(l_{2},m_{2})}^{(l_{3},m_{3})}\mathbf{x}^{l_{1}}_{m_{1}}\mathbf% {y}^{l_{2}}_{m_{2}}( bold_x ⊗ bold_y ) start_POSTSUPERSCRIPT italic_l start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT ( italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ( italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT bold_x start_POSTSUPERSCRIPT italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_y start_POSTSUPERSCRIPT italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT

Where C(l1,m2)(l2,m2)(l3,m3)superscriptsubscript𝐶subscript𝑙1subscript𝑚2subscript𝑙2subscript𝑚2subscript𝑙3subscript𝑚3C_{(l_{1},m_{2})(l_{2},m_{2})}^{(l_{3},m_{3})}italic_C start_POSTSUBSCRIPT ( italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ( italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT are Clebsch-Gordan (CG) coefficients. This operation is found extremely useful in constructing edge and node updating functions, as seen in []. However, the conventional tensor product has the time and memory scales of O(lmax6)𝑂superscriptsubscript𝑙𝑚𝑎𝑥6O(l_{max}^{6})italic_O ( italic_l start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT ) [19] where lmaxsubscript𝑙𝑚𝑎𝑥l_{max}italic_l start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT is the maximum angular momentum of 𝐱𝐱\mathbf{x}bold_x and 𝐲𝐲\mathbf{y}bold_y. Such complexity limits the application like the equivariant forces fields where high inference speed is vital. It poses an even greater challenge for quantum tensor prediction tasks, which require tensor products on very high-order spherical tensors. For example, constructing blocks of d𝑑ditalic_d-d𝑑ditalic_d, f𝑓fitalic_f-f𝑓fitalic_f and g𝑔gitalic_g-g𝑔gitalic_g orbital pairs require irreps of maximum order l=4𝑙4l=4italic_l = 4, l=6𝑙6l=6italic_l = 6 and l=8𝑙8l=8italic_l = 8, respectively. Such high costs make training for large-size systems nearly impossible.

3.   Model Architecure

III.1 Parameterize Equivariant Quantum Operators

The quantum operators O^^𝑂\hat{O}over^ start_ARG italic_O end_ARG, such as the Hamiltonian and density matrix in the LCAO-based DFT framework, follow the equivariance of the E(3)𝐸3E(3)italic_E ( 3 ) group and can thus be expressed in tensorized features using group theory. This process is illustrated in Fig. 2. Specifically, the E(3)𝐸3E(3)italic_E ( 3 ) group consists of spatial rotation, translation, and reflection. Translational symmetry is automatically satisfied by performing operations only on relative coordinates, while rotation and reflection require special treatments. Under the LCAO basis, the quantum tensor block can be expressed as:

Ol1,l2,m1,m2i,j=i,l1,m1|O^|j,l2,m2superscriptsubscript𝑂subscript𝑙1subscript𝑙2subscript𝑚1subscript𝑚2𝑖𝑗quantum-operator-product𝑖subscript𝑙1subscript𝑚1^𝑂𝑗subscript𝑙2subscript𝑚2\displaystyle O_{l_{1},l_{2},m_{1},m_{2}}^{i,j}=\langle i,l_{1},m_{1}|\hat{O}|% j,l_{2},m_{2}\rangleitalic_O start_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_j end_POSTSUPERSCRIPT = ⟨ italic_i , italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | over^ start_ARG italic_O end_ARG | italic_j , italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟩

Here i𝑖iitalic_i and j𝑗jitalic_j denote atomic sites, while l1,m1subscript𝑙1subscript𝑚1l_{1},m_{1}italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, and l2,m2subscript𝑙2subscript𝑚2l_{2},m_{2}italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT refer to the atomic orbitals of site i𝑖iitalic_i and site j𝑗jitalic_j respectively. The terms m1subscript𝑚1m_{1}italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and m2subscript𝑚2m_{2}italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are the magnetic quantum numbers associated with the angular momentum l1subscript𝑙1l_{1}italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and l2subscript𝑙2l_{2}italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. Ol1,l2,m1,m2i,jsuperscriptsubscript𝑂subscript𝑙1subscript𝑙2subscript𝑚1subscript𝑚2𝑖𝑗O_{l_{1},l_{2},m_{1},m_{2}}^{i,j}italic_O start_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_j end_POSTSUPERSCRIPT are the matrix element of operator O^^𝑂\hat{O}over^ start_ARG italic_O end_ARG. When the coordinates of the system are transformed by a rotation operation 𝐑𝐑\mathbf{R}bold_R, equivariance requires the quantum tensor to transform accordingly as:

Ol1,l2,m1,m2i,j=m1,m2D(𝐑)m1,m1l1D(𝐑)m2,m2l2Ol1,l2,m1,m2i,jsubscriptsuperscript𝑂𝑖𝑗subscript𝑙1subscript𝑙2superscriptsubscript𝑚1superscriptsubscript𝑚2subscriptsubscript𝑚1subscript𝑚2𝐷subscriptsuperscript𝐑subscript𝑙1subscript𝑚1superscriptsubscript𝑚1superscript𝐷subscriptsuperscript𝐑subscript𝑙2subscript𝑚2superscriptsubscript𝑚2superscriptsubscript𝑂subscript𝑙1subscript𝑙2subscript𝑚1subscript𝑚2𝑖𝑗\displaystyle O^{\prime i,j}_{l_{1},l_{2},m_{1}^{\prime},m_{2}^{\prime}}=\sum_% {m_{1},m_{2}}D(\mathbf{R})^{l_{1}}_{m_{1},m_{1}^{\prime}}D^{*}(\mathbf{R})^{l_% {2}}_{m_{2},m_{2}^{\prime}}O_{l_{1},l_{2},m_{1},m_{2}}^{i,j}italic_O start_POSTSUPERSCRIPT ′ italic_i , italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_D ( bold_R ) start_POSTSUPERSCRIPT italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_D start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( bold_R ) start_POSTSUPERSCRIPT italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_O start_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_j end_POSTSUPERSCRIPT (1)

Where Osuperscript𝑂O^{\prime}italic_O start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is the equivariant block after transformation, and D(𝐑)m,ml𝐷subscriptsuperscript𝐑𝑙𝑚superscript𝑚D(\mathbf{R})^{l}_{m,m^{\prime}}italic_D ( bold_R ) start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m , italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT is the Wigner-D matrix of order l𝑙litalic_l. To construct such equivariance blocks from irreps, the Wigner-Eckart theorem can be applied, which decomposes the operator indexed by l1,l2subscript𝑙1subscript𝑙2l_{1},l_{2}italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT into a single index l3subscript𝑙3l_{3}italic_l start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT that satisfies |l1l2|l3(l1+l2)subscript𝑙1subscript𝑙2subscript𝑙3subscript𝑙1subscript𝑙2|l_{1}-l_{2}|\leq l_{3}\leq(l_{1}+l_{2})| italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | ≤ italic_l start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ≤ ( italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ), which states:

ol3,m3i,j=l1,m1,l2,m2C(l1,m1)(l2,m2)(l3,m3)Ol1,l2,m1,m2i,jsuperscriptsubscript𝑜subscript𝑙3subscript𝑚3𝑖𝑗subscriptsubscript𝑙1subscript𝑚1subscript𝑙2subscript𝑚2superscriptsubscript𝐶subscript𝑙1subscript𝑚1subscript𝑙2subscript𝑚2subscript𝑙3subscript𝑚3subscriptsuperscript𝑂𝑖𝑗subscript𝑙1subscript𝑙2subscript𝑚1subscript𝑚2\displaystyle o_{l_{3},m_{3}}^{i,j}=\sum_{l_{1},m_{1},l_{2},m_{2}}C_{(l_{1},m_% {1})(l_{2},m_{2})}^{(l_{3},m_{3})}O^{i,j}_{l_{1},l_{2},m_{1},m_{2}}italic_o start_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_j end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT ( italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ( italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT italic_O start_POSTSUPERSCRIPT italic_i , italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT (2)

Here, the edge (ij𝑖𝑗i\neq jitalic_i ≠ italic_j) and node (i=j𝑖𝑗i=jitalic_i = italic_j) features ol3,m3i,jsuperscriptsubscript𝑜subscript𝑙3subscript𝑚3𝑖𝑗o_{l_{3},m_{3}}^{i,j}italic_o start_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_j end_POSTSUPERSCRIPT are grouped by the index m𝑚mitalic_m into vectors of 𝐨c,li,jsuperscriptsubscript𝐨𝑐𝑙𝑖𝑗\mathbf{o}_{c,l}^{i,j}bold_o start_POSTSUBSCRIPT italic_c , italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_j end_POSTSUPERSCRIPT with c𝑐citalic_c accounting for multiple tensors for the same l𝑙litalic_l. These features can be computed for hopping (ij𝑖𝑗i\neq jitalic_i ≠ italic_j) and onsite (i=j𝑖𝑗i=jitalic_i = italic_j) elements of the quantum operators. Further by leveraging the Hermitian nature of quantum operators, the parameterized elements can be further reduced to upper diagonal blocks.

The decomposed features 𝐨c,li,jsuperscriptsubscript𝐨𝑐𝑙𝑖𝑗\mathbf{o}_{c,l}^{i,j}bold_o start_POSTSUBSCRIPT italic_c , italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_j end_POSTSUPERSCRIPT are aligned with the network prediction through a rescaling procedure. The target node/edge equivariant features 𝐨c,li,i/jsubscriptsuperscript𝐨𝑖𝑖𝑗𝑐𝑙\mathbf{o}^{i,i/j}_{c,l}bold_o start_POSTSUPERSCRIPT italic_i , italic_i / italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_l end_POSTSUBSCRIPT can be standarized and decomposed into constant norms and biases, and the normalized features that have balanced variance. Formally:

𝐨c,li,i/jsubscriptsuperscript𝐨𝑖𝑖𝑗𝑐𝑙\displaystyle\mathbf{o}^{i,i/j}_{c,l}bold_o start_POSTSUPERSCRIPT italic_i , italic_i / italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_l end_POSTSUBSCRIPT =σc,lZi,Zi/j𝐨^c,li,i/j+μc,lZi,Zi/jδl,0absentsubscriptsuperscript𝜎subscript𝑍𝑖subscript𝑍𝑖𝑗𝑐𝑙subscriptsuperscript^𝐨𝑖𝑖𝑗𝑐𝑙subscriptsuperscript𝜇subscript𝑍𝑖subscript𝑍𝑖𝑗𝑐𝑙subscript𝛿𝑙0\displaystyle=\sigma^{Z_{i},Z_{i/j}}_{c,l}\hat{\mathbf{o}}^{i,i/j}_{c,l}+\mu^{% Z_{i},Z_{i/j}}_{c,l}\delta_{l,0}= italic_σ start_POSTSUPERSCRIPT italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT italic_i / italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_l end_POSTSUBSCRIPT over^ start_ARG bold_o end_ARG start_POSTSUPERSCRIPT italic_i , italic_i / italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_l end_POSTSUBSCRIPT + italic_μ start_POSTSUPERSCRIPT italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT italic_i / italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_l end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_l , 0 end_POSTSUBSCRIPT

Here, Zisubscript𝑍𝑖Z_{i}italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, Zjsubscript𝑍𝑗Z_{j}italic_Z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are the atom species of atom i𝑖iitalic_i and j𝑗jitalic_j, σc,lZi,Zi/jsubscriptsuperscript𝜎subscript𝑍𝑖subscript𝑍𝑖𝑗𝑐𝑙\sigma^{Z_{i},Z_{i/j}}_{c,l}italic_σ start_POSTSUPERSCRIPT italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT italic_i / italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_l end_POSTSUBSCRIPT and μc,lZi,Zi/jsubscriptsuperscript𝜇subscript𝑍𝑖subscript𝑍𝑖𝑗𝑐𝑙\mu^{Z_{i},Z_{i/j}}_{c,l}italic_μ start_POSTSUPERSCRIPT italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT italic_i / italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_l end_POSTSUBSCRIPT are norm and bias value for each atom and atom-pair. These values are derived from the dataset statistics and subsequently applied as weights and shift biases in an atom/bond type-specific scaling layer. The improvement of the standardization procedure on enhancing model convergence is evident. For the tensor operators in LCAO basis, the diagonal blocks (or node features) often exhibit highly unbalanced norms across irreps of different l𝑙litalic_l, hindering the model from learning a smooth function. Additionally, different irreps in the off-diagonal blocks (or edge features) demonstrate diverse decaying behaviors, making it challenging to learn these decays implicitly. By normalizing in this manner, the decay function for each irrep is constrained to a value ranging from 1 to 0. This facilitates the use of hidden states and a few layers of a Multilayer Perceptron (MLP) activated by a ReLU-like function, to explicitly project the decaying value from the preserved hidden scalar features for each edge. Detailed analysis of the decaying behavior of edge irreps and the norm of node irreps can be found in the supplementary materials (SM).

At this stage, the objective is to generate the node and edge features 𝐧c,lisubscriptsuperscript𝐧𝑖𝑐𝑙\mathbf{n}^{i}_{c,l}bold_n start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_l end_POSTSUBSCRIPT, 𝐞c,lijsubscriptsuperscript𝐞𝑖𝑗𝑐𝑙\mathbf{e}^{ij}_{c,l}bold_e start_POSTSUPERSCRIPT italic_i italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_l end_POSTSUBSCRIPT, and to supervise them towards achieving decomposed target features 𝐨^c,li,isubscriptsuperscript^𝐨𝑖𝑖𝑐𝑙\hat{\mathbf{o}}^{i,i}_{c,l}over^ start_ARG bold_o end_ARG start_POSTSUPERSCRIPT italic_i , italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_l end_POSTSUBSCRIPT and 𝐨^c,li,jsubscriptsuperscript^𝐨𝑖𝑗𝑐𝑙\hat{\mathbf{o}}^{i,j}_{c,l}over^ start_ARG bold_o end_ARG start_POSTSUPERSCRIPT italic_i , italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_l end_POSTSUBSCRIPT. Once a satisfactory accuracy is attained, the inversion of the Wigner-Eckart theorem facilitates the reconstruction of predicted operator representations such as Hamiltonian and density matrix blocks:

Ol1,l2,m1,m2i,j=l3,m3C(l1,m1)(l2,m2)(l3,m3)ol3,m3i,jsubscriptsuperscript𝑂𝑖𝑗subscript𝑙1subscript𝑙2subscript𝑚1subscript𝑚2subscriptsubscript𝑙3subscript𝑚3superscriptsubscript𝐶subscript𝑙1subscript𝑚1subscript𝑙2subscript𝑚2subscript𝑙3subscript𝑚3superscriptsubscript𝑜subscript𝑙3subscript𝑚3𝑖𝑗\displaystyle O^{i,j}_{l_{1},l_{2},m_{1},m_{2}}=\sum_{l_{3},m_{3}}C_{(l_{1},m_% {1})(l_{2},m_{2})}^{(l_{3},m_{3})}o_{l_{3},m_{3}}^{i,j}italic_O start_POSTSUPERSCRIPT italic_i , italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT ( italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ( italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT italic_o start_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_j end_POSTSUPERSCRIPT

The whole procedure satisfies the rotational, transformational and reflectional symmetry.

III.2 Parameterize of Invariant Overlap Operators

The overlap matrix, analogous to other quantum operators, is defined as:

Sl1,l2,m1,m2i,jsubscriptsuperscript𝑆𝑖𝑗subscript𝑙1subscript𝑙2subscript𝑚1subscript𝑚2\displaystyle S^{i,j}_{l_{1},l_{2},m_{1},m_{2}}italic_S start_POSTSUPERSCRIPT italic_i , italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT =i,l1,m1|j,l2,m2absentinner-product𝑖subscript𝑙1subscript𝑚1𝑗subscript𝑙2subscript𝑚2\displaystyle=\langle i,l_{1},m_{1}|j,l_{2},m_{2}\rangle= ⟨ italic_i , italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | italic_j , italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟩
=ϕm1i,l1(𝐫)ϕm2j,l2(𝐫𝐫ij)𝑑𝐫absentsubscriptsuperscriptitalic-ϕ𝑖subscript𝑙1subscript𝑚1𝐫subscriptsuperscriptitalic-ϕ𝑗subscript𝑙2subscript𝑚2𝐫subscript𝐫𝑖𝑗differential-d𝐫\displaystyle=\int{\phi^{i,l_{1}}_{m_{1}}(\mathbf{r})\phi^{j,l_{2}}_{m_{2}}}(% \mathbf{r}-\mathbf{r}_{ij})d\mathbf{r}= ∫ italic_ϕ start_POSTSUPERSCRIPT italic_i , italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_r ) italic_ϕ start_POSTSUPERSCRIPT italic_j , italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_r - bold_r start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) italic_d bold_r

where ϕmi,lsubscriptsuperscriptitalic-ϕ𝑖𝑙𝑚\phi^{i,l}_{m}italic_ϕ start_POSTSUPERSCRIPT italic_i , italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT is the LCAO basis. The overlap matrix also satisfies the equivariance relation defined in Eq.1. Since the equivariant dependency comes fully from the bases, it is possible to rotate them to align with the axis of 𝐫ijsubscript𝐫𝑖𝑗\mathbf{r}_{ij}bold_r start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT to reduce the angular dependence, simplifying the matrix elements further into scalars via the relation [40]:

ϕm1i,l1(𝐫)ϕm2j,l2(𝐫𝐫ij)𝑑𝐫=sl1,l2,|m1|Zi,Zj(rij)δm1,m2subscriptsuperscriptitalic-ϕ𝑖subscript𝑙1subscript𝑚1𝐫subscriptsuperscriptitalic-ϕ𝑗subscript𝑙2subscript𝑚2𝐫subscript𝐫𝑖𝑗differential-d𝐫subscriptsuperscript𝑠subscript𝑍𝑖subscript𝑍𝑗subscript𝑙1subscript𝑙2subscript𝑚1subscript𝑟𝑖𝑗subscript𝛿subscript𝑚1subscript𝑚2\displaystyle\int{\phi^{i,l_{1}}_{m_{1}}(\mathbf{r})\phi^{j,l_{2}}_{m_{2}}}(% \mathbf{r}-\mathbf{r}_{ij})d\mathbf{r}=s^{Z_{i},Z_{j}}_{l_{1},l_{2},|m_{1}|}({% r}_{ij})\delta_{m_{1},m_{2}}∫ italic_ϕ start_POSTSUPERSCRIPT italic_i , italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_r ) italic_ϕ start_POSTSUPERSCRIPT italic_j , italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_r - bold_r start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) italic_d bold_r = italic_s start_POSTSUPERSCRIPT italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , | italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | end_POSTSUBSCRIPT ( italic_r start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) italic_δ start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT

Here, the dependence changes to Zisubscript𝑍𝑖Z_{i}italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, Zjsubscript𝑍𝑗Z_{j}italic_Z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, and rijsubscript𝑟𝑖𝑗r_{ij}italic_r start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT due to the two-center nature of the overlap integrals. Parameterizing these distance-dependent radial functions involves encoding atomic species with radial information and employing a simple MLP to learn the mapping to target scalars.

fl1,l2,|m|(Zi,Zj,rij)=sl1,l2,|m1|Zi,Zj(rij)δm1,m2subscript𝑓subscript𝑙1subscript𝑙2𝑚subscript𝑍𝑖subscript𝑍𝑗subscript𝑟𝑖𝑗subscriptsuperscript𝑠subscript𝑍𝑖subscript𝑍𝑗subscript𝑙1subscript𝑙2subscript𝑚1subscript𝑟𝑖𝑗subscript𝛿subscript𝑚1subscript𝑚2\displaystyle f_{l_{1},l_{2},|m|}(Z_{i},Z_{j},r_{ij})=s^{Z_{i},Z_{j}}_{l_{1},l% _{2},|m_{1}|}(r_{ij})\delta_{m_{1},m_{2}}italic_f start_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , | italic_m | end_POSTSUBSCRIPT ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) = italic_s start_POSTSUPERSCRIPT italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , | italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | end_POSTSUBSCRIPT ( italic_r start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) italic_δ start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT (3)

After getting these parameters, we use them to construct the overlap matrix aligned with the bond axis, and then rotate it back to the original orientation.

III.3 The SLEM model

The SLEM model architecture is illustrated in Fig.2. The model maintains a set of features, including hidden features 𝐱ij,L,𝐕c,lij,Lsuperscript𝐱𝑖𝑗𝐿subscriptsuperscript𝐕𝑖𝑗𝐿𝑐𝑙\mathbf{x}^{ij,L},\mathbf{V}^{ij,L}_{c,l}bold_x start_POSTSUPERSCRIPT italic_i italic_j , italic_L end_POSTSUPERSCRIPT , bold_V start_POSTSUPERSCRIPT italic_i italic_j , italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_l end_POSTSUBSCRIPT, node features 𝐧c,li,Lsubscriptsuperscript𝐧𝑖𝐿𝑐𝑙\mathbf{n}^{i,L}_{c,l}bold_n start_POSTSUPERSCRIPT italic_i , italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_l end_POSTSUBSCRIPT and edge features 𝐞c,lij,Lsubscriptsuperscript𝐞𝑖𝑗𝐿𝑐𝑙\mathbf{e}^{ij,L}_{c,l}bold_e start_POSTSUPERSCRIPT italic_i italic_j , italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_l end_POSTSUBSCRIPT of hidden layer (L𝐿Litalic_L). Specifically, the hidden features consist of a scalar channel 𝐱ij,Lsuperscript𝐱𝑖𝑗𝐿\mathbf{x}^{ij,L}bold_x start_POSTSUPERSCRIPT italic_i italic_j , italic_L end_POSTSUPERSCRIPT and a tensor channel 𝐕c,lij,Lsubscriptsuperscript𝐕𝑖𝑗𝐿𝑐𝑙\mathbf{V}^{ij,L}_{c,l}bold_V start_POSTSUPERSCRIPT italic_i italic_j , italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_l end_POSTSUBSCRIPT. The scalar channel is initialized as an embedded vector containing two-body information, including the atomic species and the radial distance of the atom pair. These initialized two-body radial features are then processed through an MLP to the invariant Slater-Koster-like parameters of overlap. For equivariant quantum operators, such as the Hamiltonian and density matrix, the scalar and tensor channels interact to generate an ordered atom-pair representation, which is used to construct local node and edge representations. After iterative updates, the representation is scaled by the statistical norm and biases, thereby achieving the final prediction.

III.3.1 Feature initialization

Firstly, the initial scalar hidden feature is computed from the two-body embeddings of atomic species Zisubscript𝑍𝑖Z_{i}italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and Zjsubscript𝑍𝑗Z_{j}italic_Z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, and the radial distance, as follows:

𝐱ij,L=0=MLP2-body(𝟏(Zi)𝟏(Zj)B(rij))u(rij)superscript𝐱𝑖𝑗𝐿0subscriptMLP2-body1subscript𝑍𝑖norm1subscript𝑍𝑗Bsubscript𝑟𝑖𝑗𝑢subscript𝑟𝑖𝑗\displaystyle\mathbf{x}^{ij,L=0}=\textbf{MLP}_{\text{2-body}}\left(\mathbf{1}(% Z_{i})||\mathbf{1}(Z_{j})||\textbf{B}(r_{ij})\right)\cdot u(r_{ij})bold_x start_POSTSUPERSCRIPT italic_i italic_j , italic_L = 0 end_POSTSUPERSCRIPT = MLP start_POSTSUBSCRIPT 2-body end_POSTSUBSCRIPT ( bold_1 ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | | bold_1 ( italic_Z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) | | B ( italic_r start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) ) ⋅ italic_u ( italic_r start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT )

Here, atom species are embedded with one-hot vectors denoted as 𝟏(Z)1𝑍\mathbf{1}(Z)bold_1 ( italic_Z ), and a set of trainable Bessel bases B(rij)Bsubscript𝑟𝑖𝑗\textbf{B}(r_{ij})B ( italic_r start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) is utilized to encode the distance rijsubscript𝑟𝑖𝑗r_{ij}italic_r start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT between atoms i𝑖iitalic_i and j𝑗jitalic_j. u(rij)𝑢subscript𝑟𝑖𝑗u(r_{ij})italic_u ( italic_r start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) are envelope functions [41] to add explicit radial dependence. Subsequently, the edge and hidden features are initialized as weighted spherical harmonics of relative edge vectors:

𝐕c,lij,L=0subscriptsuperscript𝐕𝑖𝑗𝐿0𝑐𝑙\displaystyle\mathbf{V}^{ij,L=0}_{c,l}bold_V start_POSTSUPERSCRIPT italic_i italic_j , italic_L = 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_l end_POSTSUBSCRIPT =wc,l(LN(𝐱ij,L=0))𝐘lijabsentsubscript𝑤𝑐𝑙subscript𝐿𝑁superscript𝐱𝑖𝑗𝐿0subscriptsuperscript𝐘𝑖𝑗𝑙\displaystyle=w_{c,l}\left(L_{N}\left(\mathbf{x}^{ij,L=0}\right)\right)\mathbf% {Y}^{ij}_{l}= italic_w start_POSTSUBSCRIPT italic_c , italic_l end_POSTSUBSCRIPT ( italic_L start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_i italic_j , italic_L = 0 end_POSTSUPERSCRIPT ) ) bold_Y start_POSTSUPERSCRIPT italic_i italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT
𝐞c,lij,L=0subscriptsuperscript𝐞𝑖𝑗𝐿0𝑐𝑙\displaystyle\mathbf{e}^{ij,L=0}_{c,l}bold_e start_POSTSUPERSCRIPT italic_i italic_j , italic_L = 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_l end_POSTSUBSCRIPT =wc,l(𝐱ij,L=0)𝐘lijabsentsubscript𝑤𝑐𝑙superscript𝐱𝑖𝑗𝐿0subscriptsuperscript𝐘𝑖𝑗𝑙\displaystyle=w_{c,l}\left(\mathbf{x}^{ij,L=0}\right)\mathbf{Y}^{ij}_{l}= italic_w start_POSTSUBSCRIPT italic_c , italic_l end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_i italic_j , italic_L = 0 end_POSTSUPERSCRIPT ) bold_Y start_POSTSUPERSCRIPT italic_i italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT

Here, the weights are learned from the initialized scalar hidden features 𝐱ij,L=0superscript𝐱𝑖𝑗𝐿0\mathbf{x}^{ij,L=0}bold_x start_POSTSUPERSCRIPT italic_i italic_j , italic_L = 0 end_POSTSUPERSCRIPT. Layer normalization LNsubscript𝐿𝑁L_{N}italic_L start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ensures that the hidden tensor features have a balanced amplitude of each edge. The initial node features are then computed as linear transformations of the aggregated edge features:

𝐧c,li,L=0=Linear(1Navgj𝒩(i)𝐞c,lij,L=0)superscriptsubscript𝐧𝑐𝑙𝑖𝐿0Linear1subscript𝑁𝑎𝑣𝑔subscript𝑗𝒩𝑖subscriptsuperscript𝐞𝑖𝑗𝐿0𝑐𝑙\displaystyle\mathbf{n}_{c,l}^{i,L=0}=\textbf{Linear}\left(\frac{1}{\sqrt{N_{% avg}}}\sum_{j\in\mathcal{N}(i)}\mathbf{e}^{ij,L=0}_{c,l}\right)bold_n start_POSTSUBSCRIPT italic_c , italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_L = 0 end_POSTSUPERSCRIPT = Linear ( divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_N start_POSTSUBSCRIPT italic_a italic_v italic_g end_POSTSUBSCRIPT end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_N ( italic_i ) end_POSTSUBSCRIPT bold_e start_POSTSUPERSCRIPT italic_i italic_j , italic_L = 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_l end_POSTSUBSCRIPT )

Here 𝒩(i)𝒩𝑖\mathcal{N}(i)caligraphic_N ( italic_i ) and Navgsubscript𝑁𝑎𝑣𝑔N_{avg}italic_N start_POSTSUBSCRIPT italic_a italic_v italic_g end_POSTSUBSCRIPT are the neighbouring atoms and the average number of neighbours of atomi𝑖-i- italic_i.

III.3.2 Speed up tensor product

To integrate the information from the equivariant features, the tensor product is employed in all updating blocks of the SLEM model. Generally, the tensor product in SLEM is performed with the concatenated equivariant features 𝐟~c,lijsubscriptsuperscript~𝐟𝑖𝑗𝑐𝑙\tilde{\mathbf{f}}^{ij}_{c,l}over~ start_ARG bold_f end_ARG start_POSTSUPERSCRIPT italic_i italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_l end_POSTSUBSCRIPT and the weighted projection of the edge shift vector 𝒓ij=𝒓i𝒓jsubscript𝒓𝑖𝑗subscript𝒓𝑖subscript𝒓𝑗{\bm{r}}_{ij}={\bm{r}}_{i}-{\bm{r}}_{j}bold_italic_r start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = bold_italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT on the spherical harmonics function 𝐘lijsuperscriptsubscript𝐘𝑙𝑖𝑗\mathbf{Y}_{l}^{ij}bold_Y start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i italic_j end_POSTSUPERSCRIPT. Formally:

𝐟c3,l3ij=𝐟~c1,l1ijwc2,l2ij𝐘l2ij=c1,l1,l2w~c1,l1,l2ijm1,m2C(l1,m1)(l2,m2)(l3,m3)fc1,l1,m1ijYl2,m2ijsubscriptsuperscript𝐟𝑖𝑗subscript𝑐3subscript𝑙3tensor-productsubscriptsuperscript~𝐟𝑖𝑗subscript𝑐1subscript𝑙1subscriptsuperscript𝑤𝑖𝑗subscript𝑐2subscript𝑙2superscriptsubscript𝐘subscript𝑙2𝑖𝑗subscriptsubscript𝑐1subscript𝑙1subscript𝑙2superscriptsubscript~𝑤subscript𝑐1subscript𝑙1subscript𝑙2𝑖𝑗subscriptsubscript𝑚1subscript𝑚2superscriptsubscript𝐶subscript𝑙1subscript𝑚1subscript𝑙2subscript𝑚2subscript𝑙3subscript𝑚3subscriptsuperscript𝑓𝑖𝑗subscript𝑐1subscript𝑙1subscript𝑚1superscriptsubscript𝑌subscript𝑙2subscript𝑚2𝑖𝑗\begin{split}\mathbf{f}^{ij}_{c_{3},l_{3}}&=\tilde{\mathbf{f}}^{ij}_{c_{1},l_{% 1}}\otimes w^{ij}_{c_{2},l_{2}}\mathbf{Y}_{l_{2}}^{ij}\\ &=\sum_{c_{1},l_{1},l_{2}}\tilde{w}_{c_{1},l_{1},l_{2}}^{ij}\sum_{m_{1},m_{2}}% C_{(l_{1},m_{1})(l_{2},m_{2})}^{(l_{3},m_{3})}\\ &\cdot f^{ij}_{c_{1},l_{1},m_{1}}Y_{l_{2},m_{2}}^{ij}\\ \end{split}start_ROW start_CELL bold_f start_POSTSUPERSCRIPT italic_i italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_CELL start_CELL = over~ start_ARG bold_f end_ARG start_POSTSUPERSCRIPT italic_i italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⊗ italic_w start_POSTSUPERSCRIPT italic_i italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_Y start_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i italic_j end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = ∑ start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT over~ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i italic_j end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT ( italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ( italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ⋅ italic_f start_POSTSUPERSCRIPT italic_i italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i italic_j end_POSTSUPERSCRIPT end_CELL end_ROW (4)

Here, w~c1,l1,l2ij=c2wc1,c2,l1,l2wc2,l2ijsuperscriptsubscript~𝑤subscript𝑐1subscript𝑙1subscript𝑙2𝑖𝑗subscriptsubscript𝑐2subscript𝑤subscript𝑐1subscript𝑐2subscript𝑙1subscript𝑙2subscriptsuperscript𝑤𝑖𝑗subscript𝑐2subscript𝑙2\tilde{w}_{c_{1},l_{1},l_{2}}^{ij}=\sum_{c_{2}}w_{c_{1},c_{2},l_{1},l_{2}}w^{% ij}_{c_{2},l_{2}}over~ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i italic_j end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_w start_POSTSUPERSCRIPT italic_i italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT are edge-specific parameters for each tensor product operation. Performing such tensor products on high-order features is computationally intensive. Therefore, we applied the recently developed SO(2) [19] convolution to simplify, reducing the computation and storage complexity from O(lmax6)𝑂superscriptsubscript𝑙𝑚𝑎𝑥6O(l_{max}^{6})italic_O ( italic_l start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT ) to O(lmax3)𝑂superscriptsubscript𝑙𝑚𝑎𝑥3O(l_{max}^{3})italic_O ( italic_l start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ). The simplification idea is intuitive. 𝐘l2,m2ijsuperscriptsubscript𝐘subscript𝑙2subscript𝑚2𝑖𝑗\mathbf{Y}_{l_{2},m_{2}}^{ij}bold_Y start_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i italic_j end_POSTSUPERSCRIPT are sparse tensors if rotated to align with the edge ij𝑖𝑗ijitalic_i italic_j, which is nonzero only for m2=0subscript𝑚20m_{2}=0italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0. Therefore, it is easier to compute the tensor production in the direction of edge ij𝑖𝑗ijitalic_i italic_j, and rotate inversely the output afterwards. This step removes the m2subscript𝑚2m_{2}italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT index from the summation in Eq. LABEL:tp. Additionally, considering Clebsch-Gordan coefficients with m2=0subscript𝑚20m_{2}=0italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0, we find that C(l1,m1)(l2,0)(l3,m3)=0superscriptsubscript𝐶subscript𝑙1subscript𝑚1subscript𝑙20subscript𝑙3subscript𝑚30C_{(l_{1},m_{1})(l_{2},0)}^{(l_{3},m_{3})}=0italic_C start_POSTSUBSCRIPT ( italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ( italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , 0 ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT = 0 except for m3=±m1subscript𝑚3plus-or-minussubscript𝑚1m_{3}=\pm m_{1}italic_m start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = ± italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. This allows further reduction of the summation in Eq. LABEL:tp by replacing ±m1plus-or-minussubscript𝑚1\pm m_{1}± italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT with a single index m𝑚mitalic_m. Then the operations can be reformulated formally as:

(fc,l,mijfc,l,mij)=c,l(wc,c,l,mwc,c,l,mwc,c,l,mwc,c,l,m)(f~c,l,mijf~c,l,mij)subscriptsuperscript𝑓𝑖𝑗𝑐𝑙𝑚subscriptsuperscript𝑓𝑖𝑗𝑐𝑙𝑚subscriptsuperscript𝑐superscript𝑙subscript𝑤𝑐superscript𝑐superscript𝑙𝑚subscript𝑤𝑐superscript𝑐superscript𝑙𝑚subscript𝑤𝑐superscript𝑐superscript𝑙𝑚subscript𝑤𝑐superscript𝑐superscript𝑙𝑚subscriptsuperscript~𝑓𝑖𝑗superscript𝑐superscript𝑙𝑚subscriptsuperscript~𝑓𝑖𝑗superscript𝑐superscript𝑙𝑚\left(\begin{array}[]{l}f^{ij}_{c,l,m}\\ f^{ij}_{c,l,-m}\end{array}\right)=\sum_{c^{\prime},l^{\prime}}\left(\begin{% array}[]{l}w_{c,c^{\prime},l^{\prime},m}-w_{c,c^{\prime},l^{\prime},-m}\\ w_{c,c^{\prime},l^{\prime},-m}\quad w_{c,c^{\prime},l^{\prime},m}\end{array}% \right)\cdot\left(\begin{array}[]{l}{\tilde{f}^{ij}_{c^{\prime},l^{\prime},m}}% \\ {\tilde{f}^{ij}_{c^{\prime},l^{\prime},-m}}\end{array}\right)( start_ARRAY start_ROW start_CELL italic_f start_POSTSUPERSCRIPT italic_i italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_l , italic_m end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_f start_POSTSUPERSCRIPT italic_i italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_l , - italic_m end_POSTSUBSCRIPT end_CELL end_ROW end_ARRAY ) = ∑ start_POSTSUBSCRIPT italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( start_ARRAY start_ROW start_CELL italic_w start_POSTSUBSCRIPT italic_c , italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_m end_POSTSUBSCRIPT - italic_w start_POSTSUBSCRIPT italic_c , italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , - italic_m end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_w start_POSTSUBSCRIPT italic_c , italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , - italic_m end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_c , italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_m end_POSTSUBSCRIPT end_CELL end_ROW end_ARRAY ) ⋅ ( start_ARRAY start_ROW start_CELL over~ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT italic_i italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_m end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL over~ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT italic_i italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , - italic_m end_POSTSUBSCRIPT end_CELL end_ROW end_ARRAY )

This represents a linear operation on f~c,l,mijsubscriptsuperscript~𝑓𝑖𝑗superscript𝑐superscript𝑙superscript𝑚\tilde{f}^{ij}_{c^{\prime},l^{\prime},m^{\prime}}over~ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT italic_i italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT. By employing this method, high-order tensor products for l=8𝑙8l=8italic_l = 8, 9999 and even 10101010 can be efficiently calculated, which is essential for heavy element systems where the f𝑓fitalic_f, g𝑔gitalic_g or even higher orbitals are used in the LCAO basis for DFT calculations. Finally, the weights for the new SO(2) tensor product method are multiplied by edge-specific parameters, where wc,lijsubscriptsuperscript𝑤𝑖𝑗superscript𝑐superscript𝑙w^{ij}_{c^{\prime},l^{\prime}}italic_w start_POSTSUPERSCRIPT italic_i italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT are mapped by an MLP from hidden scalar features 𝐱ij,Lsuperscript𝐱𝑖𝑗𝐿\mathbf{x}^{ij,L}bold_x start_POSTSUPERSCRIPT italic_i italic_j , italic_L end_POSTSUPERSCRIPT as w~c,c,l,mij=wc,c,l,mwc,lijsubscriptsuperscript~𝑤𝑖𝑗𝑐superscript𝑐superscript𝑙𝑚subscript𝑤𝑐superscript𝑐superscript𝑙𝑚subscriptsuperscript𝑤𝑖𝑗superscript𝑐superscript𝑙\tilde{w}^{ij}_{c,c^{\prime},l^{\prime},m}=w_{c,c^{\prime},l^{\prime},m}w^{ij}% _{c^{\prime},l^{\prime}}over~ start_ARG italic_w end_ARG start_POSTSUPERSCRIPT italic_i italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_m end_POSTSUBSCRIPT = italic_w start_POSTSUBSCRIPT italic_c , italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_m end_POSTSUBSCRIPT italic_w start_POSTSUPERSCRIPT italic_i italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT. This powerful and efficient tensor product layer facilitates the construction of local interactive updates of the features.

III.3.3 Hidden updates

To construct many-body interactions, as in Fig. 2(b), the node features 𝐧c,lisubscriptsuperscript𝐧𝑖𝑐𝑙\mathbf{n}^{i}_{c,l}bold_n start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_l end_POSTSUBSCRIPT and hidden tensor features 𝐕c,lijsubscriptsuperscript𝐕𝑖𝑗𝑐𝑙\mathbf{V}^{ij}_{c,l}bold_V start_POSTSUPERSCRIPT italic_i italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_l end_POSTSUBSCRIPT would be concatenated and doing tensor product with the projection coefficients of edge shift vector 𝒓ijsubscript𝒓𝑖𝑗\bm{r}_{ij}bold_italic_r start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT on the spherical harmonics functions. The operation is written formally as:

𝐕~c3,l3ij,L=(𝐧i,L1||𝐕ij,L1)c1,l1wc2,l2ij,L𝐘l2ij\tilde{\mathbf{V}}^{ij,L}_{c_{3},l_{3}}=\left(\mathbf{n}^{i,L-1}||\mathbf{V}^{% ij,L-1}\right)_{c_{1},l_{1}}\otimes w_{c_{2},l_{2}}^{ij,L}\mathbf{Y}_{l_{2}}^{ij}over~ start_ARG bold_V end_ARG start_POSTSUPERSCRIPT italic_i italic_j , italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = ( bold_n start_POSTSUPERSCRIPT italic_i , italic_L - 1 end_POSTSUPERSCRIPT | | bold_V start_POSTSUPERSCRIPT italic_i italic_j , italic_L - 1 end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⊗ italic_w start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i italic_j , italic_L end_POSTSUPERSCRIPT bold_Y start_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i italic_j end_POSTSUPERSCRIPT (5)

Unlike most MPNN, the hidden states 𝐱ijsuperscript𝐱𝑖𝑗\mathbf{x}^{ij}bold_x start_POSTSUPERSCRIPT italic_i italic_j end_POSTSUPERSCRIPT and 𝐕c,lijsubscriptsuperscript𝐕𝑖𝑗𝑐𝑙\mathbf{V}^{ij}_{c,l}bold_V start_POSTSUPERSCRIPT italic_i italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_l end_POSTSUBSCRIPT in SLEM is only a function of the local environment of centre atom i𝑖iitalic_i, the shift vector of edge ij𝑖𝑗ijitalic_i italic_j, and the information of atom j𝑗jitalic_j. Such a design excludes neighbours of j𝑗jitalic_j into hidden states.

After the tensor production, the output features 𝐕~c,lij,Lsubscriptsuperscript~𝐕𝑖𝑗𝐿𝑐𝑙\tilde{\mathbf{V}}^{ij,L}_{c,l}over~ start_ARG bold_V end_ARG start_POSTSUPERSCRIPT italic_i italic_j , italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_l end_POSTSUBSCRIPT will be passed through the gated non-linearity [41], and transformed by an E3linear [42] layer to mix up the information across different channels. The new hidden feature will be multiplied by the weights learned from normalized scalar features to include the radial information explicitly and return as the updated feature 𝐕c,lij,Lsubscriptsuperscript𝐕𝑖𝑗𝐿𝑐𝑙\mathbf{V}^{ij,L}_{c,l}bold_V start_POSTSUPERSCRIPT italic_i italic_j , italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_l end_POSTSUBSCRIPT.

The scalar hidden features are updated by mixing the 0th0th0\text{th}0 th order information from 𝐕c,lij,Lsubscriptsuperscript𝐕𝑖𝑗𝐿𝑐𝑙\mathbf{V}^{ij,L}_{c,l}bold_V start_POSTSUPERSCRIPT italic_i italic_j , italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_l end_POSTSUBSCRIPT with a latent MLP, which is:

𝐱ij,L=MLP(𝐱ij,L1||𝐕c,l=0ij,L)u(rij)\mathbf{x}^{ij,L}=\textbf{MLP}\left(\mathbf{x}^{ij,L-1}||\mathbf{V}^{ij,L}_{c,% l=0}\right)\cdot u(r_{ij})bold_x start_POSTSUPERSCRIPT italic_i italic_j , italic_L end_POSTSUPERSCRIPT = MLP ( bold_x start_POSTSUPERSCRIPT italic_i italic_j , italic_L - 1 end_POSTSUPERSCRIPT | | bold_V start_POSTSUPERSCRIPT italic_i italic_j , italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_l = 0 end_POSTSUBSCRIPT ) ⋅ italic_u ( italic_r start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) (6)

Containing the explicit decaying envelope function u(rij)𝑢subscript𝑟𝑖𝑗u(r_{ij})italic_u ( italic_r start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ), and the many-body interactions of scalar and tensor features, the sclar hiddien states 𝐱ij,Lsuperscript𝐱𝑖𝑗𝐿\mathbf{x}^{ij,L}bold_x start_POSTSUPERSCRIPT italic_i italic_j , italic_L end_POSTSUPERSCRIPT can be very useful to describing the explicit decaying behaviour of each edge irreps feature along the radial distance.

III.3.4 Node updates

The strict local node representation 𝐧c,lij,Lsubscriptsuperscript𝐧𝑖𝑗𝐿𝑐𝑙\mathbf{n}^{ij,L}_{c,l}bold_n start_POSTSUPERSCRIPT italic_i italic_j , italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_l end_POSTSUBSCRIPT can be constructed naturally from the many-body interactive tensor features 𝐕c,lij,Lsubscriptsuperscript𝐕𝑖𝑗𝐿𝑐𝑙\mathbf{V}^{ij,L}_{c,l}bold_V start_POSTSUPERSCRIPT italic_i italic_j , italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_l end_POSTSUBSCRIPT. We follows the MPNN style to create the message from node-j𝑗jitalic_j to node-i𝑖iitalic_i. Formally:

𝐦c3,l3ij,L=(𝐧i,L1||𝐕ij,L)c1,l1wc2,l2ij,L𝐘l2ij\mathbf{m}^{ij,L}_{c_{3},l_{3}}=\left(\mathbf{n}^{i,L-1}||\mathbf{V}^{ij,L}% \right)_{c_{1},l_{1}}\otimes w_{c_{2},l_{2}}^{ij,L}\mathbf{Y}_{l_{2}}^{ij}bold_m start_POSTSUPERSCRIPT italic_i italic_j , italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = ( bold_n start_POSTSUPERSCRIPT italic_i , italic_L - 1 end_POSTSUPERSCRIPT | | bold_V start_POSTSUPERSCRIPT italic_i italic_j , italic_L end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⊗ italic_w start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i italic_j , italic_L end_POSTSUPERSCRIPT bold_Y start_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i italic_j end_POSTSUPERSCRIPT (7)

Here again, we exclude the neighbouring information of atom-j𝑗jitalic_j in 𝐦c3,l3ij,Lsubscriptsuperscript𝐦𝑖𝑗𝐿subscript𝑐3subscript𝑙3\mathbf{m}^{ij,L}_{c_{3},l_{3}}bold_m start_POSTSUPERSCRIPT italic_i italic_j , italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT via this partial updates of 𝐕c,lij,Lsubscriptsuperscript𝐕𝑖𝑗𝐿𝑐𝑙\mathbf{V}^{ij,L}_{c,l}bold_V start_POSTSUPERSCRIPT italic_i italic_j , italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_l end_POSTSUBSCRIPT. However, all necessary interactions are contained in this format, including the interactions of atom-j𝑗jitalic_j, bond ij𝑖𝑗i-jitalic_i - italic_j, atom-i𝑖iitalic_i, and the environment of i𝑖iitalic_i.

Each message then is passed through a gated activation and E3Linear layer, and weighted separately by weights learnt from the hidden scalar features, by:

𝐧c3,l3ij,L=1Navgj𝒩(i)wc3,l3ij𝐦c3,l3ij,Lsubscriptsuperscript𝐧𝑖𝑗𝐿subscript𝑐3subscript𝑙31subscript𝑁𝑎𝑣𝑔subscript𝑗𝒩𝑖subscriptsuperscript𝑤𝑖𝑗subscript𝑐3subscript𝑙3subscriptsuperscript𝐦𝑖𝑗𝐿subscript𝑐3subscript𝑙3\mathbf{n}^{ij,L}_{c_{3},l_{3}}=\frac{1}{N_{avg}}\sum_{j\in\mathcal{N}(i)}w^{% ij}_{c_{3},l_{3}}\mathbf{m}^{ij,L}_{c_{3},l_{3}}bold_n start_POSTSUPERSCRIPT italic_i italic_j , italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUBSCRIPT italic_a italic_v italic_g end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_N ( italic_i ) end_POSTSUBSCRIPT italic_w start_POSTSUPERSCRIPT italic_i italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_m start_POSTSUPERSCRIPT italic_i italic_j , italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT (8)

The weights here are different from those in hidden updates since it is learnt from 𝐱ij,Lsuperscript𝐱𝑖𝑗𝐿\mathbf{x}^{ij,L}bold_x start_POSTSUPERSCRIPT italic_i italic_j , italic_L end_POSTSUPERSCRIPT directly without passing through a normalization. Therefore, the absolute radial decay is enforced in the weights, giving strong priors that the message from a closer atom is often more important while maintaining flexibility. Meanwhile, combining the updates of hidden scalar 𝐱ij,Lsuperscript𝐱𝑖𝑗𝐿\mathbf{x}^{ij,L}bold_x start_POSTSUPERSCRIPT italic_i italic_j , italic_L end_POSTSUPERSCRIPT, we can see the weights wc3,l3ijsubscriptsuperscript𝑤𝑖𝑗subscript𝑐3subscript𝑙3w^{ij}_{c_{3},l_{3}}italic_w start_POSTSUPERSCRIPT italic_i italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT are dependent on the features 𝐕ij,Lsuperscript𝐕𝑖𝑗𝐿\mathbf{V}^{ij,L}bold_V start_POSTSUPERSCRIPT italic_i italic_j , italic_L end_POSTSUPERSCRIPT and 𝐧i,L1superscript𝐧𝑖𝐿1\mathbf{n}^{i,L-1}bold_n start_POSTSUPERSCRIPT italic_i , italic_L - 1 end_POSTSUPERSCRIPT, as shown in Eq.6. Such style aligns surprisingly with the equivariant graph attention mechanism [16, 17] that shows powerful expressibility in various tasks. Here wc3,l3ijsubscriptsuperscript𝑤𝑖𝑗subscript𝑐3subscript𝑙3{w}^{ij}_{c_{3},l_{3}}italic_w start_POSTSUPERSCRIPT italic_i italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT corresponding to an attention score computed from 𝐕ij,Lsuperscript𝐕𝑖𝑗𝐿\mathbf{V}^{ij,L}bold_V start_POSTSUPERSCRIPT italic_i italic_j , italic_L end_POSTSUPERSCRIPT and 𝐧i,L1superscript𝐧𝑖𝐿1\mathbf{n}^{i,L-1}bold_n start_POSTSUPERSCRIPT italic_i , italic_L - 1 end_POSTSUPERSCRIPT. Therefore, through this update, the node features’ dependencies are kept strictly local.

III.3.5 Edge updates

The locality of edge features 𝐞c,lij,Lsubscriptsuperscript𝐞𝑖𝑗𝐿𝑐𝑙\mathbf{e}^{ij,L}_{c,l}bold_e start_POSTSUPERSCRIPT italic_i italic_j , italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_l end_POSTSUBSCRIPT can be naturally preserved as long as the node features are localized strictly. By mixing the information of node features on both sides, localized updates can be formulated as:

𝐞~c3,l3ij,L=(𝐧i,L1𝐕ij,L𝐧j,L1)c1,l1wc2,l2ij,L𝐘l2ijsubscriptsuperscript~𝐞𝑖𝑗𝐿subscript𝑐3subscript𝑙3tensor-productsubscriptsuperscript𝐧𝑖𝐿1normsuperscript𝐕𝑖𝑗𝐿superscript𝐧𝑗𝐿1subscript𝑐1subscript𝑙1superscriptsubscript𝑤subscript𝑐2subscript𝑙2𝑖𝑗𝐿superscriptsubscript𝐘subscript𝑙2𝑖𝑗\displaystyle\tilde{\mathbf{e}}^{ij,L}_{c_{3},l_{3}}=\left(\mathbf{n}^{i,L-1}|% |\mathbf{V}^{ij,L}||\mathbf{n}^{j,L-1}\right)_{c_{1},l_{1}}\otimes w_{c_{2},l_% {2}}^{ij,L}\mathbf{Y}_{l_{2}}^{ij}over~ start_ARG bold_e end_ARG start_POSTSUPERSCRIPT italic_i italic_j , italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = ( bold_n start_POSTSUPERSCRIPT italic_i , italic_L - 1 end_POSTSUPERSCRIPT | | bold_V start_POSTSUPERSCRIPT italic_i italic_j , italic_L end_POSTSUPERSCRIPT | | bold_n start_POSTSUPERSCRIPT italic_j , italic_L - 1 end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⊗ italic_w start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i italic_j , italic_L end_POSTSUPERSCRIPT bold_Y start_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i italic_j end_POSTSUPERSCRIPT

Similarly, the updated edge features are processed via a gated activation and an E3Linear layer, and then multiplied with weights learnt from the hidden scalar features (without normalization) as:

𝐞c3,l3ij,L=wc3,l3ij𝐞~c3,l3ij,Lsubscriptsuperscript𝐞𝑖𝑗𝐿subscript𝑐3subscript𝑙3subscriptsuperscript𝑤𝑖𝑗subscript𝑐3subscript𝑙3subscriptsuperscript~𝐞𝑖𝑗𝐿subscript𝑐3subscript𝑙3\displaystyle\mathbf{e}^{ij,L}_{c_{3},l_{3}}=w^{ij}_{c_{3},l_{3}}\tilde{% \mathbf{e}}^{ij,L}_{c_{3},l_{3}}bold_e start_POSTSUPERSCRIPT italic_i italic_j , italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_w start_POSTSUPERSCRIPT italic_i italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT over~ start_ARG bold_e end_ARG start_POSTSUPERSCRIPT italic_i italic_j , italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT

Here we reframe the updating rules of SLEM with the MPNN framework, which looks like this:

𝐕ij,Lsuperscript𝐕𝑖𝑗𝐿\displaystyle\mathbf{V}^{ij,L}bold_V start_POSTSUPERSCRIPT italic_i italic_j , italic_L end_POSTSUPERSCRIPT =𝒱L(𝐧i,L1,𝐕ij,L1)absentsubscript𝒱𝐿superscript𝐧𝑖𝐿1superscript𝐕𝑖𝑗𝐿1\displaystyle=\mathcal{V}_{L}\left(\mathbf{n}^{i,L-1},\mathbf{V}^{ij,L-1}\right)= caligraphic_V start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ( bold_n start_POSTSUPERSCRIPT italic_i , italic_L - 1 end_POSTSUPERSCRIPT , bold_V start_POSTSUPERSCRIPT italic_i italic_j , italic_L - 1 end_POSTSUPERSCRIPT )
𝐞ij,Lsuperscript𝐞𝑖𝑗𝐿\displaystyle\mathbf{e}^{ij,L}bold_e start_POSTSUPERSCRIPT italic_i italic_j , italic_L end_POSTSUPERSCRIPT =𝒩L(𝐧i,L1,𝐕ij,L,𝐧j,L1)absentsubscript𝒩𝐿superscript𝐧𝑖𝐿1superscript𝐕𝑖𝑗𝐿superscript𝐧𝑗𝐿1\displaystyle=\mathcal{N}_{L}\left(\mathbf{n}^{i,L-1},\mathbf{V}^{ij,L},% \mathbf{n}^{j,L-1}\right)= caligraphic_N start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ( bold_n start_POSTSUPERSCRIPT italic_i , italic_L - 1 end_POSTSUPERSCRIPT , bold_V start_POSTSUPERSCRIPT italic_i italic_j , italic_L end_POSTSUPERSCRIPT , bold_n start_POSTSUPERSCRIPT italic_j , italic_L - 1 end_POSTSUPERSCRIPT )
𝐦ij,Lsuperscript𝐦𝑖𝑗𝐿\displaystyle\mathbf{m}^{ij,L}bold_m start_POSTSUPERSCRIPT italic_i italic_j , italic_L end_POSTSUPERSCRIPT =ML(𝐧i,L1,𝐕ij,L)absentsubscript𝑀𝐿superscript𝐧𝑖𝐿1superscript𝐕𝑖𝑗𝐿\displaystyle=M_{L}\left(\mathbf{n}^{i,L-1},\mathbf{V}^{ij,L}\right)= italic_M start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ( bold_n start_POSTSUPERSCRIPT italic_i , italic_L - 1 end_POSTSUPERSCRIPT , bold_V start_POSTSUPERSCRIPT italic_i italic_j , italic_L end_POSTSUPERSCRIPT )
𝐧i,Lsuperscript𝐧𝑖𝐿\displaystyle\mathbf{n}^{i,L}bold_n start_POSTSUPERSCRIPT italic_i , italic_L end_POSTSUPERSCRIPT =UL(𝐧i,L1,j𝒩(i)𝐦ij,L)absentsubscript𝑈𝐿superscript𝐧𝑖𝐿1subscript𝑗𝒩𝑖superscript𝐦𝑖𝑗𝐿\displaystyle=U_{L}\left(\mathbf{n}^{i,L-1},\sum_{j\in\mathcal{N}(i)}\mathbf{m% }^{ij,L}\right)= italic_U start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ( bold_n start_POSTSUPERSCRIPT italic_i , italic_L - 1 end_POSTSUPERSCRIPT , ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_N ( italic_i ) end_POSTSUBSCRIPT bold_m start_POSTSUPERSCRIPT italic_i italic_j , italic_L end_POSTSUPERSCRIPT )

Where 𝒱Lsubscript𝒱𝐿\mathcal{V}_{L}caligraphic_V start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT is the neural network for hidden feature construction. This update scheme constructs many-body interactions to build equivariant edge and node features while preserving the absolute locality by excluding atoms outside the constant cutoff radius. Such a scheme, in principle, would have much better transferability, data-efficiency, as well as scalability. Most importantly, it is possible to parallelize such update computations, allowing for training and inference on multiple devices. This is also vital for inferencing on very large atomic structures. The following section will focus on validating the effectiveness of this framework via learning equivariant DFT Hamiltonians and density matrices and overlap, which have very high-order tensors and need extremely high prediction accuracy. We demonstrate our method is much better than the previously reported ones in training and inference speed, transferability and the potential power to parralize the inference on multi-GPU devices.

Table 1: MAE (in meV) for Hamiltonian matrix predictions using SLEM and other methods on materials with DFT-LCAO basis up to d𝑑ditalic_d, f𝑓fitalic_f, and g𝑔gitalic_g orbitals. Numbers in parentheses indicate parameter count. Data for HfO2 in DeePH-E3 and for both GaN and HfO2 in HamGNN are absent due to out-of-memory errors.
Systems with LCAO-basis up to d𝑑ditalic_d-orbitals
Material SLEM DeepH-E3 HamGNN
(0.7M) (4.5M) ( 1.0 M) ( 4.5M) (10M)
MoS2 0.34 0.14 0.46 0.55 0.63
Graphene 0.26 0.14 0.40 0.28 0.40
Si(300K) 0.10 0.07 0.16 0.10 9.0
Si(600K) 0.20 0.13 0.19 0.14 9.7
Systems with LCAO-basis up to f𝑓fitalic_f and g𝑔gitalic_g-orbitals
SLEM DeepH-E3
(1.7M) (1.9M)
GaN 0.21 0.87
HfO2 0.28 -
Table 2: MAE for predicting density matrix using SLEM on materials with DFT-LCAO basis up to d𝑑ditalic_d, f𝑓fitalic_f, and g𝑔gitalic_g orbitals. Model settings align with those in Table LABEL:hamiltonian.
SLEM charge density model
Materials Silicon GaN HfO2
MAE 8.9e-5 2.3e-5 3.9e-5

4.   Results

IV.1 Benchmark the accuracy and data-efficiency

Refer to caption
Figure 3: Comparison of band structures for a Si MD trajectory snapshot: SLEM prediction vs. DFT calculation. Predicted band structures are obtained from either diagonalization of the predicted Hamiltonian or NSCF DFT calculation using predicted charge density, yielding indistinguishable results. Inset: Visualization of charge density distribution for the same structure.

We benchmark our model’s performance in fitting Hamiltonian, density matrix, and overlap matrix using a diverse dataset. For the Hamiltonian, the datasets include systems with up to d𝑑ditalic_d orbitals, encompassing 2D systems like single-layer MoS2 and graphene from reported datasets [11], as well as 3D bulk silicon generated in this work. Given SLEM’s proficiency with high-order spherical tensors, we also train it on our generated datasets of bulk GaN and HfO2 systems, which include f𝑓fitalic_f and g𝑔gitalic_g orbitals, respectively. Structures in the Si, GaN, and HfO2 systems are sampled via molecular dynamics using neural network potentials [34]. For density matrix and overlap matrix evaluations, we test the SLEM model exclusively on the datasets of Si, GaN, and HfO2, as the reported datasets for MoS2 and graphene lack density and overlap matrix data.

As shown in Table LABEL:hamiltonian, our SLEM model achieves state-of-the-art accuracy in Hamiltonian prediction, exhibiting the lowest mean absolute error (MAE) across all test systems compared to referenced methods. Notably, high accuracy is achieved with a relatively small model size (only 0.7 million (M) trainable parameters). Fig. 3 illustrates the band structure of silicon structures computed directly from the SLEM-predicted Hamiltonian, where the eigenvalues are indistinguishable from those obtained using DFT. For the density matrix, fitting results are presented in Table LABEL:density. The results demonstrate very high accuracy (order of 1e-5), approaching the machine precision limit of float32 numbers. In Fig. 3, we use the trained model to predict the density matrix and visualize its real-space distribution. This capability is particularly important for applications such as charge distribution analysis or tracking electron transfer via methods like differential charge density. Furthermore, the predicted density can be directly used for non-self-consistent field (NSCF) DFT calculations. The resulting band structure for silicon, as an example, is highly accurate and matches the DFT output, with a MAE of only 1.09 meV in eigenvalues compared to self-consistent DFT results. The overlap matrix, represented by invariant parameters in our SLEM model, achieves exceptionally high accuracy as demonstrated in Table LABEL:overlap, approaching the machine precision limit for float32 numbers. Notably, our simplified parameterization enables this high accuracy with only a minimal increase in model complexity. For instance, in a silicon model designed to fit only the Hamiltonian, our typical parameter count is 0.7 M. The inclusion of overlap matrix prediction adds merely 0.01 M parameters, representing an increase of only about 1.4% in total model size.

Table 3: MAE for predicting overlap matrix using SLEM’s parameterization on materials with DFT-LCAO basis up to d𝑑ditalic_d, f𝑓fitalic_f, and g𝑔gitalic_g orbitals. Model settings align with those in Table LABEL:hamiltonian.
SLEM overlap prediction
Materials Silicon GaN HfO2
MAE 5.6e-5 4.7e-5 6.3e-5
Table 4: Comparison of validation MAE (in meV) for SLEM model and literature methods trained on randomly split datasets [11] with varying training ratios. Model settings align with those in Table LABEL:hamiltonian.
MoS2
Partition 100% 80% 60% 40% 20%
SLEM 0.34 0.37 0.39 0.37 0.37
DeePH-E3 0.46 0.72 0.84 1.03 1.46
HamGNN 1.11 1.15 1.38 1.65 1.68
Graphene
Partition 100% 80% 60% 40% 20%
SLEM 0.26 0.26 0.27 0.21 0.26
DeePH-E3 0.40 0.30 0.33 0.36 0.60
HamGNN 0.41 0.32 0.37 0.42 0.60

Additionally, due to the strict localization scheme, our SLEM model demonstrates superior data efficiency. Specifically, training the SLEM model requires fewer data points generated via DFT calculations. To quantify the data-efficiency performance, we designed an experiment by randomly splitting the dataset into subsets containing 20%, 40%, 60%, and 80% of the original training data. We then trained the strictly local SLEM model and other methods using these subsets and validated their performance on the same validation sets. The results, presented in Table LABEL:data-efficiency, illustrate the remarkable data-efficiency of our method. Unlike the other methods, which showed a considerable decrease in accuracy with reduced data, our model maintained high accuracy across all subsets. This indicates that using SLEM allows users to generate a smaller, cost-effective training set. Furthermore, the excellent data efficiency and transferability of SLEM make it particularly suitable as a backbone model for developing a universal DFT model, especially for systems involving heavy elements.

Refer to caption
Figure 4: Comparison of time and memory consumption for different tensor-product implementations. (a) Time consumption vs. angular momentum (l𝑙litalic_l) for different models, including the SO(2)-based SLEM model (triangles) with and without radial part (r𝑟ritalic_r), DeepH-E3 (cross) [8], and E3NN (square) [42] models. Inset: Log-scale fit with slopes of 1.2 for the SLEM model and 3.7 for the other two models. (b) Memory consumption vs. l𝑙litalic_l. The SLEM model demonstrates over two orders of magnitude improvement in both time and memory efficiency compared to DeepH-E3 and E3NN.

IV.2 Efficiency and Scalability

Many interesting and important properties in material science, chemistry and biology appear with systems composed of heavy atoms. The inclusion of heavy atoms would introduce very high-order spherical tensors when representing quantum operators such as the DFT Hamiltonian. Therefore, scaling to such systems would be quite a challenge because the tensor product used to construct complex spherical tensors scales as O(L6)𝑂superscript𝐿6O(L^{6})italic_O ( italic_L start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT ). It is very hard to train and inference on systems with heavy atoms using conventional tensor production. Moreover, inferring large material systems while training with small structures is particularly valuable. It therefore requires parallelising the model inference by assigning partitions of the whole atomic structure to multiple GPU workers. However, this is very difficult for most of the currently available models. Since the increasing receptive fields from the iterative updates of the graph would increase the minimal size of each partitioned subgraph, which makes such partition less effective. The SLEM model addresses these challenges by efficiently constructing high-order tensor products and assisting parallelization through its strict locality.

Firstly, for efficiency, the implementation of SO(2) convolution reduces the tensor product computational complexity from O(L6)𝑂superscript𝐿6O(L^{6})italic_O ( italic_L start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT ) to O(L3)𝑂superscript𝐿3O(L^{3})italic_O ( italic_L start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ), which is then further reduced by the parallelization of matrix operations to near O(L)𝑂𝐿O(L)italic_O ( italic_L ) benefiting from PyTorch. In Fig. 4, we compare the wall time and GPU memory consumed by tensor product operations using SO(2) convolution and the normal method employed in DeepH-E3 [8] and E3NN [42]. Clearly, tensor production via SO(2) convolution is far more efficient than conventional methods. This reduction of cost makes our method very efficient, which is capable of handling all possible basis choices in LCAO DFT. We also record the memory and time required when training on some of the typical systems. The results are displayed in Figure.5. We can see, that in all systems, our model trains faster in time while requiring less memory. Notably, the larger the basis set, the greater the advantage our method has in both memory and time efficiency.

Refer to caption
Figure 5: Comparison of training time per iteration and memory consumption among SLEM, DeepH-E3 [8], and HamGNN [10]. For GaN, HamGNN data is absent due to out-of-memory error.

Additionally, the strict local design assists the parallelization greatly. Since the edge and node features in the SLEM model are constructed locally, each node and edge feature is only dependent on the neighbour subgraph of the whole system, as:

𝐧i,Lsuperscript𝐧𝑖𝐿\displaystyle\mathbf{n}^{i,L}bold_n start_POSTSUPERSCRIPT italic_i , italic_L end_POSTSUPERSCRIPT =UL(𝐧i,L1,j𝒩(i)𝐦i,L)absentsubscript𝑈𝐿superscript𝐧𝑖𝐿1subscript𝑗𝒩𝑖superscript𝐦𝑖𝐿\displaystyle=U_{L}\left(\mathbf{n}^{i,L-1},\sum_{j\in\mathcal{N}(i)}\mathbf{m% }^{i,L}\right)= italic_U start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ( bold_n start_POSTSUPERSCRIPT italic_i , italic_L - 1 end_POSTSUPERSCRIPT , ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_N ( italic_i ) end_POSTSUBSCRIPT bold_m start_POSTSUPERSCRIPT italic_i , italic_L end_POSTSUPERSCRIPT ) (9)
=U(Zi,{𝐘ij,Zj}jN(i))absent𝑈subscript𝑍𝑖subscriptsuperscript𝐘𝑖𝑗subscript𝑍𝑗𝑗𝑁𝑖\displaystyle=U(Z_{i},\{\mathbf{Y}^{ij},Z_{j}\}_{j\in N(i)})= italic_U ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , { bold_Y start_POSTSUPERSCRIPT italic_i italic_j end_POSTSUPERSCRIPT , italic_Z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_j ∈ italic_N ( italic_i ) end_POSTSUBSCRIPT )

Therefore, it is possible to divide the whole atomic system into sub-graphs and compute their node and edge features independently on different devices. This is extremely important when expanding the simulation power of DFT to very large systems. In practice, we find that for HfO2 with 4s2p2d2f1g4𝑠2𝑝2𝑑2𝑓1𝑔4s2p2d2f1g4 italic_s 2 italic_p 2 italic_d 2 italic_f 1 italic_g basis, a typical model of 1.7M parameters can only predict the quantum tensors of structures up to 103superscript10310^{3}10 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT atoms on devices with 32GB memory. Despite the linear dependency, the inference on a system with 104107similar-tosuperscript104superscript10710^{4}\sim 10^{7}10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ∼ 10 start_POSTSUPERSCRIPT 7 end_POSTSUPERSCRIPT would require a memory with potentially over 300 GB, which is only available when parallelised to multiple GPUs. Therefore, a strictly localized model such as SLEM holds significant potential for expanding simulation system sizes.

Discussion

This work proposed SLEM, the Strictly Local Equalvariant Message-Passing model for the prediction of quantum operator representations, as well as a systematic view of the parameterization of equivariant quantum operators. The strict local design of SLEM builds many-body dependence without expanding the effective receptive field, contributing to excellent accuracy, data efficiency and transferability. The integration of SO(2) convolution [19] enables rapid and memory-efficient tensor production, which is crucial for modeling heavy atoms. Furthermore, SLEM’s locality potentially allows for extension to extremely large atomic systems through parallelization over partitioned atomic graphs, thus enabling first-principles electronic structure simulations at device-level scales. Beyond its modeling capabilities, SLEM intrinsically supports the representation of multiple quantum operators, including the Hamiltonian, density matrix, and overlap matrix. By parameterizing the overlap matrix using a two-center integration formula, it is transformed into invariant quantities readily obtainable from SLEM’s initialized two-body embedding features. This approach significantly reduces computational costs and minimizes the model’s reliance on DFT software post-training.

While SLEM demonstrates efficacy across various applications, several areas warrant further investigation:

  • Sampling Methodology for Active Learning: Scientific computation tasks require high confidence in calculated results, which can be challenging for data-driven approaches. Developing a robust sampling workflow for active learning is essential for reliability. While techniques like uncertainty-driven sampling with Gaussian regression [43] or model ensembles [44] have addressed confidence issues in machine learning force fields, designing a sampling workflow specifically for quantum operator models remains an open question.

  • Software Integration for Post-Processing: Integrating the model with existing software is vital, especially for high-throughput calculations and large atomic systems beyond conventional DFT capabilities. An efficient, parallelizable solver for extracting physical quantities from the model’s predictions is highly beneficial. While some software based on stochastic techniques  [45, 46] shows promise, an open-sourced and highly-optimized solution for non-orthogonal bases remains unavailable.

Future research addressing these challenges will further enhance the applicability and reliability of SLEM and similar quantum operator models in computational materials science and chemistry.

Code availability

The implementation of the SLEM model and its semi-local variant (named LEM) are open-source and accessible via the DeePTB GitHub repository. To facilitate the integration of DFT outputs with machine learning models, we have developed and released a supplementary tool, dftio. This tool enables parallel parsing of DFT outputs into a machine-readable data format, enhancing the efficiency of data preparation for model training and analysis.

Acknowledgements.
We appreciate the insightful discussions with Siyuan Liu, Feitong Song, Jijie Zou, and Duo Zhang. This research utilized the computational resources of the Bohrium Cloud Platform (https://bohrium.dp.tech) provided by DP Technology.

References

  • Hohenberg and Kohn [1964] P. Hohenberg and W. Kohn, Inhomogeneous electron gas, Phys. Rev. 136, B864 (1964).
  • Kohn and Sham [1965] W. Kohn and L. J. Sham, Self-consistent equations including exchange and correlation effects, Phys. Rev. 140, A1133 (1965).
  • Jones [2015] R. O. Jones, Density functional theory: Its origins, rise to prominence, and future, Rev. Mod. Phys. 87, 897 (2015).
  • Unke et al. [2021] O. Unke, M. Bogojeski, M. Gastegger, M. Geiger, T. Smidt, and K.-R. Müller, SE(3)-equivariant prediction of molecular wavefunctions and electronic densities, Advances in Neural Information Processing Systems 34, 14434 (2021).
  • Yu et al. [2023] H. Yu, Z. Xu, X. Qian, X. Qian, and S. Ji, Efficient and equivariant graph networks for predicting quantum hamiltonian, in International Conference on Machine Learning (PMLR, 2023) pp. 40412–40424.
  • Dong et al. [2024] X. Dong, E. Gull, and L. Wang, Equivariant neural network for green’s functions of molecules and materials, Phys. Rev. B 109, 075112 (2024).
  • Yin et al. [2024] S. Yin, X. Pan, F. Wang, F. Wu, and L. He, A framework of so(3)-equivariant non-linear representation learning and its application to electronic-structure hamiltonian prediction, arXiv:2405.05722  (2024).
  • Gong et al. [2023] X. Gong, H. Li, N. Zou, R. Xu, W. Duan, and Y. Xu, General framework for E(3)-equivariant neural network representation of density functional theory Hamiltonian, Nature Communications 14, 2848 (2023).
  • Nigam et al. [2022] J. Nigam, M. J. Willatt, and M. Ceriotti, Equivariant representations for molecular Hamiltonians and N-center atomic-scale properties, The Journal of Chemical Physics 156, 014115 (2022).
  • Zhong et al. [2023] Y. Zhong, H. Yu, M. Su, X. Gong, and H. Xiang, Transferable equivariant graph neural networks for the hamiltonians of molecules and solids, npj Computational Materials 9, 182 (2023).
  • Li et al. [2022] H. Li, Z. Wang, N. Zou, M. Ye, R. Xu, X. Gong, W. Duan, and Y. Xu, Deep-learning density functional theory hamiltonian for efficient ab initio electronic-structure calculation, Nature Computational Science 2, 367 (2022).
  • Han et al. [2024] J. Han, J. Cen, L. Wu, Z. Li, X. Kong, R. Jiao, Z. Yu, T. Xu, F. Wu, Z. Wang, et al., A survey of geometric graph neural networks: Data structures, models and applications, arXiv:2403.00485  (2024).
  • Musaelian et al. [2023] A. Musaelian, S. Batzner, A. Johansson, L. Sun, C. J. Owen, M. Kornbluth, and B. Kozinsky, Learning local equivariant representations for large-scale atomistic dynamics, Nature Communications 14, 579 (2023).
  • Batatia et al. [2022] I. Batatia, D. P. Kovacs, G. Simm, C. Ortner, and G. Csányi, MACE: Higher order equivariant message passing neural networks for fast and accurate force fields, Advances in Neural Information Processing Systems 35, 11423 (2022).
  • Joshi et al. [2023] C. K. Joshi, C. Bodnar, S. V. Mathis, T. Cohen, and P. Lio, On the expressive power of geometric graph neural networks, in International conference on machine learning (PMLR, 2023) pp. 15330–15355.
  • Liao and Smidt [2022] Y.-L. Liao and T. Smidt, Equiformer: Equivariant graph attention transformer for 3d atomistic graphs, arXiv:2206.11990  (2022).
  • Liao et al. [2023] Y.-L. Liao, B. Wood, A. Das, and T. Smidt, Equiformerv2: Improved equivariant transformer for scaling to higher-degree representations, arXiv:2306.12059  (2023).
  • Simeon and De Fabritiis [2024] G. Simeon and G. De Fabritiis, TensorNet: Cartesian tensor representations for efficient learning of molecular potentials, Advances in Neural Information Processing Systems 36 (2024).
  • Passaro and Zitnick [2023] S. Passaro and C. L. Zitnick, Reducing SO(3) convolutions to SO(2) for efficient equivariant gnns, in International Conference on Machine Learning (PMLR, 2023) pp. 27420–27438.
  • Zitnick et al. [2022] L. Zitnick, A. Das, A. Kolluru, J. Lan, M. Shuaibi, A. Sriram, Z. Ulissi, and B. Wood, Spherical channels for modeling atomic interactions, Advances in Neural Information Processing Systems 35, 8054 (2022).
  • Huckel and Debye [1923] E. Huckel and P. Debye, Zur theorie der elektrolyte. i. gefrierpunktserniedrigung und verwandte erscheinungen, Phys. Z 24, 185 (1923).
  • Resta [1977] R. Resta, Thomas-fermi dielectric screening in semiconductors, Phys. Rev. B 16, 2717 (1977).
  • Ninno et al. [2006] D. Ninno, F. Trani, G. Cantele, K. J. Hameeuw, G. Iadonisi, E. Degoli, and S. Ossicini, Thomas-Fermi model of electronic screening in semiconductor nanocrystals, Europhysics Letters 74, 519 (2006).
  • Gu et al. [2023] Q. Gu, Z. Zhouyin, S. K. Pandey, P. Zhang, L. Zhang, and W. E, DeePTB: A deep learning-based tight-binding approach with ab𝑎𝑏abitalic_a italic_b initio𝑖𝑛𝑖𝑡𝑖𝑜initioitalic_i italic_n italic_i italic_t italic_i italic_o accuracy, arXiv:2307.04638  (2023).
  • Slater and Koster [1954] J. C. Slater and G. F. Koster, Simplified LCAO method for the periodic potential problem, Phys. Rev. 94, 1498 (1954).
  • Gilmer et al. [2017] J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl, Neural message passing for quantum chemistry, in International conference on machine learning (PMLR, 2017) pp. 1263–1272.
  • Schütt et al. [2020] K. T. Schütt, S. Chmiela, O. A. Von Lilienfeld, A. Tkatchenko, K. Tsuda, and K.-R. Müller, Machine learning meets quantum physics, Lecture Notes in Physics  (2020).
  • Schütt et al. [2018] K. T. Schütt, H. E. Sauceda, P.-J. Kindermans, A. Tkatchenko, and K.-R. Müller, SchNet – A deep learning architecture for molecules and materials, The Journal of Chemical Physics 148, 241722 (2018).
  • Satorras et al. [2021] V. G. Satorras, E. Hoogeboom, and M. Welling, E(n) equivariant graph neural networks, in International conference on machine learning (PMLR, 2021) pp. 9323–9332.
  • Behler and Parrinello [2007] J. Behler and M. Parrinello, Generalized neural-network representation of high-dimensional potential-energy surfaces, Phys. Rev. Lett. 98, 146401 (2007).
  • Bartók et al. [2010] A. P. Bartók, M. C. Payne, R. Kondor, and G. Csányi, Gaussian approximation potentials: The accuracy of quantum mechanics, without the electrons, Phys. Rev. Lett. 104, 136403 (2010).
  • Thompson et al. [2015] A. Thompson, L. Swiler, C. Trott, S. Foiles, and G. Tucker, Spectral neighbor analysis method for automated generation of quantum-accurate interatomic potentials, Journal of Computational Physics 285, 316 (2015).
  • Zhang et al. [2018a] L. Zhang, J. Han, H. Wang, R. Car, and W. E, Deep potential molecular dynamics: A scalable model with the accuracy of quantum mechanics, Phys. Rev. Lett. 120, 143001 (2018a).
  • Wang et al. [2018] H. Wang, L. Zhang, J. Han, and W. E, DeePMD-kit: A deep learning package for many-body potential energy representation and molecular dynamics, Computer Physics Communications 228, 178 (2018).
  • Zhang et al. [2018b] L. Zhang, J. Han, H. Wang, W. Saidi, R. Car, and W. E, End-to-end symmetry preserving inter-atomic potential energy model for finite and extended systems, Advances in Neural Information Processing Systems 31 (2018b).
  • Thomas et al. [2018] N. Thomas, T. Smidt, S. Kearnes, L. Yang, L. Li, K. Kohlhoff, and P. Riley, Tensor field networks: Rotation-and translation-equivariant neural networks for 3d point clouds, arXiv:1802.08219  (2018).
  • Weiler et al. [2018] M. Weiler, M. Geiger, M. Welling, W. Boomsma, and T. S. Cohen, 3D Steerable CNNs: Learning rotationally equivariant features in volumetric data, Advances in Neural Information Processing Systems 31 (2018).
  • Kondor et al. [2018] R. Kondor, Z. Lin, and S. Trivedi, Clebsch–Gordan nets: a fully fourier space spherical convolutional neural network, Advances in Neural Information Processing Systems 31 (2018).
  • Kondor [2018] R. Kondor, N-body networks: a covariant hierarchical neural network architecture for learning atomic potentials, arXiv:1803.01588  (2018).
  • Podolskiy and Vogl [2004] A. V. Podolskiy and P. Vogl, Compact expression for the angular dependence of tight-binding Hamiltonian matrix elements, Phys. Rev. B 69, 233101 (2004).
  • Batzner et al. [2022] S. Batzner, A. Musaelian, L. Sun, M. Geiger, J. P. Mailoa, M. Kornbluth, N. Molinari, T. E. Smidt, and B. Kozinsky, E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials, Nature Communications 13, 2453 (2022).
  • Geiger and Smidt [2022] M. Geiger and T. Smidt, e3nn: Euclidean neural networks, arXiv:2207.09453  (2022).
  • Vandermause et al. [2020] J. Vandermause, S. B. Torrisi, S. Batzner, Y. Xie, L. Sun, A. M. Kolpak, and B. Kozinsky, On-the-fly active learning of interpretable Bayesian force fields for atomistic rare events, npj Computational Materials 6, 1 (2020).
  • Zhang et al. [2020] Y. Zhang, H. Wang, W. Chen, J. Zeng, L. Zhang, H. Wang, and W. E, DP-GEN: A concurrent learning platform for the generation of reliable deep learning based potential energy models, Computer Physics Communications 253, 107206 (2020).
  • Li et al. [2023] Y. Li, Z. Zhan, X. Kuang, Y. Li, and S. Yuan, TBPLaS: A tight-binding package for large-scale simulation, Computer Physics Communications 285, 108632 (2023).
  • João et al. [2020] S. M. João, M. Anđelković, L. Covaci, T. G. Rappoport, J. M. Lopes, and A. Ferreira, Kite: high-performance accurate modelling of electronic structure and response functions of large molecules, disordered crystals and heterostructures, Royal Society Open Science 7, 191809 (2020).