LSPI: Heterogeneous Graph Neural Network Classification Aggregation Algorithm Based on Size Neighbor Path Identification

Yufei Zhao [email protected] Shiduo Wang [email protected] Hua Duan [email protected]
Abstract

Existing heterogeneous graph neural network algorithms (HGNNs) mostly rely on meta-paths to capture the rich semantic information contained in heterogeneous graphs (also known as heterogeneous information networks (HINs)), but most of these HGNNs focus on different ways of feature aggregation and ignore the properties of the meta-paths themselves. This paper studies meta-paths in three commonly used data sets and finds that there are huge differences in the number of neighbors connected by different meta-paths. At the same time, the noise information contained in large neighbor paths will have an adverse impact on model performance. Therefore, this paper proposes a Heterogeneous Graph Neural Network Classification and Aggregation Algorithm Based on Large and Small Neighbor Path Identification(LSPI). LSPI firstly divides the meta-paths into large and small neighbor paths through the path discriminator , and in order to reduce the noise interference problem in large neighbor paths, LSPI selects neighbor nodes with higher similarity from both topology and feature perspectives, and passes small neighbor paths and filtered large neighbor paths through different graph convolution components. Aggregation is performed to obtain feature information under different subgraphs, and then LSPI uses subgraph-level attention to fuse the feature information under different subgraphs to generate the final node embedding. Finally this paper verifies the superiority of the method through extensive experiments and also gives suggestions on the number of nodes to be retained in large neighbor paths through experiments. The complete reproducible code adn data has been published at: https://github.com/liuhua811/LSPIA.

keywords:
Heterogeneous graph neural network , Node filtering , Graph embedding , Graph representation learning
\affiliation

organization=College of Mathematics and Systems Science, Shandong University of Science and Technology, city=Qingdao, postcode=266590, state=Shandong, country=China

1 Introduction

With the rapid development of neural networks, the application of neural networks in the real world is rapidly gaining popularity. However, traditional neural networks only work on Euclidean spatial data, but non-Euclidean spatial data are also prevalent in the real world, such as heterogeneous graph. In order to address the feature capture ability of neural networks for non-Euclidean spatial data such as heterogeneous graph, heterogeneous graph neural networks have attracted the attention of a wide range of researchers in recent years. Currently, heterogeneous graph neural networks have been applied in areas such as academic networks [1, 2], transportation systems [3], drug response [4] and physical systems [5].Therefore, doing a good job in feature mining of heterogeneous graph neural networks has important application value and economic significance.

Many existing heterogeneous graph neural networks have achieved excellent performance on real-world heterogeneous graphs [6, 7, 8, 9], due to the heterogeneity of heterogeneous graphs, the same type of nodes tend not to be directly connected, so these models mostly capture the same type of neighbor nodes with the help of meta-paths. Meta-path is a unique form of connectivity in HIN, through which the same type of neighbors with different semantic connectivity relationships can be captured.

Refer to caption
Figure 1: Example of a heterogeneous graph (ACM). (a) A heterogeneous graph ACM composed of three node types, where P denotes paper, A denotes author, and S denotes subject; (b) Several meta-paths with different semantics and different lengths in the heterogeneous graph ACM; (c) Neighbors based on the three meta-paths; for simplicity, the following content only expresses meta-paths in the order of node connections, such as P-A-P abbreviated as PAP.

In this paper, we find that there is a huge difference in the number of neighbors of meta-paths with different semantics or different lengths. Taking the two meta-paths with the same length (PAP and PSP) in Figure1 as an example, the number of neighbors of the papers connected through the same authors shows a huge difference from the number of neighbors of the papers connected through the same topics; and for two semantically similar meta-paths (PAP and PAPAP) with different lengths, the number of connected nodes often increases exponentially as the length increases. Based on this analysis, this paper further studies the difference in the average number of node neighbors under different meta-paths in three commonly used data sets (ACM, IMDB, and Yelp), and the results are shown in Fig.2. It can be found from Figure2 that in the ACM data set, the difference in the average number of neighbors between PAP and PSP, meta-paths of the same length, reaches 75 times, and the difference in the average number of neighbors between PAP and PAPAP, meta-paths with similar semantics but different lengths, also reaches 52 times. In IMDB, the difference in the number of neighbors between meta-paths of the same length(MAM and MDM) is about 5 times, but the difference in the number of neighbors between semantically similar meta-paths with different lengths is also 14 times (MAM and MAMAM). The number of meta-path neighbors with different semantics and different lengths in Yelp also shows huge differences. For convenience, this article refers to a meta-path with a large number of neighboring nodes as a Large Neighbor Path (abbreviated as LargePath), and a meta-path with only a few neighboring nodes as a Small Neighbor Path (abbreviated as SmallPath).

Refer to caption
Figure 2: Mean difference in the number of node neighbors under different meta-paths.

It is unscientific to aggregate meta-paths with huge differences in the number of neighbors in the same way without distinction. Especially for LargePaths, there must be noise information in a large number of neighbor nodes. This paper verifies this conclusion through the advanced SOTA model HAN, and the results are shown in Figure3. Since different semantics have different importance, this article only compares the performance between meta-paths with similar semantics but different lengths. It can be found that on the ACM data set, as the number of neighbors increases, the accuracy of meta-path PAPAP decreases by up to 2% compared with PAP, and the accuracy of meta-path PAPAP+PSPSP also decreases by about 2% compared with PAP+PSP. This problem appears again on IMDB (MAMAM and MAM, MAMAM+MDMDM and MAM+MDM). Although the accuracy of BUBUB on Yelp has increased compared with BUB, this is due to the fact that the number of BUB neighbor nodes is too small and the complete neighbor information cannot be captured. After the number of neighbors increases, the accuracy rate under the large neighbor path combination drops significantly (BUBUB+BSBSB and BUB+BSB, BUBUB+BSBSB+BLBLB and BUB+BSB+BLB).

Refer to caption
Figure 3: Accuracy of HAN with different meta-paths.

However, to the best of the our knowledge, existing HGNNs mostly focus on different aggregation algorithms based on meta-paths, and there has not yet been any dedicated work to address the huge difference in the number of neighbors among different meta-paths. Although RoHe proposes an attention purification mechanism against adversarial attacks, it focuses on the network purification problem when facing adversarial attacks rather than the neighbor differences between different meta-paths. Based on the above analysis this paper proposes a Heterogeneous Graph Neural Network Classification and Aggregation Algorithm Based on Large and Small Neighbor Path Identification (LSPI). LSPI is divided into three parts: path discriminator, intra-path aggregation and subgraph-level attention aggregation. Specifically, after inputting Heterogeneous Graph into LSPI, the model firstly divides the graph topology into large neighbor paths(LargePaths) and small neighbor paths(SmallPaths) through the path discriminator, and for the LargePaths, LSPI uses topological priors and node feature similarities to select and aggregate neighbor nodes with the highest topological probability and feature similarity in the neighbor set. For SmallPaths LSPI uses subgraph aggregation to aggregate meta-path subgraphs to obtain feature embeddings under specific subgraphs. Finally, LSPI fuses the obtained LargePaths embedding and SmallPaths embedding through subgraph-level attention to generate the final node representation.

Specifically, the contributions of this paper are as follows:

  • 1.

    The paper is the first to address the problem of huge differences in the number of neighbors of meta-paths in heterogeneous graph neural networks, and analyzes the impact of noise information in large neighbor paths on the performance of the model.

  • 2.

    The paper proposes a Heterogeneous Graph Neural Network Classification and Aggregation Algorithm Based on Large and Small Neighbor Path Identification, using a path discriminator to divide the meta-paths into large neighbor paths and small neighbor paths, and feature aggregation by different paths.

  • 3.

    In large neighbor path aggregation, LSPI selects the neighbor node with the highest topological relationship and feature similarity from topological probability and node feature similarity.

  • 4.

    The superiority of LSPI on various tasks is verified by different experiments on three real-world datasets, while this paper explores how many neighbor nodes are retained under the large neighbor paths that are most conducive to improving the performance of the model.

2 Related Work

In this subsection we summarize some existing related work, including heterogeneous graph neural networks and large neighbor path node selection.

Heterogeneous Graph Neural Networks. HAN[10] is a pioneering work on heterogeneous graph neural networks, which uses manually designed meta-paths and the idea of hierarchical aggregation to capture semantic information within and between meta-paths; however, considering that HAN ignores the information of intermediate nodes when aggregating within meta-paths, MAGNN[11] further uses relational rotary encoders to aggregate meta-path instances in order to avoid the intermediate node loss; HPN[12] proposed a novel heterogeneous graph propagation network to capture higher-order semantics in order to alleviate the degradation phenomenon in deep HGNNs, so that it can appropriately absorb local semantics during semantic propagation to avoid the semantic confusion problem. HGT[13] designed relevant parameters for node and edge types to characterize the heterogeneous attention on each edge. This allows HGT to maintain dedicated representations for different types of nodes and edges, while also introducing temporal encoding techniques to capture the dynamic changes in the graph. HetGNN[14] uses a heterogeneous neighbor sampling strategy for nodes with the same attributes and different types in a heterogeneous graph using two aggregation methods to capture the structural information of the heterogeneous graph and the content information of each node; HetSANN[15] leverages the structural information of heterogeneous graphs to enhance node representation learning by employing a structure-aware approach to handle interactions between different types of nodes. GCNH[16] uses a learnable importance coefficient to balance the contributions of central nodes and neighboring nodes, obtaining independent representations for the combination of a node and its neighbors. BPHGNN[17] proposes a depth and breadth behavior pattern aggregation method, which automatically captures local and global relevant information, adaptively learning the importance of various behavior patterns for multi-layer heterogeneous network representation learning. SR-HGN[18] captures feature information from both relational and semantic aspects, and generates feature embeddings that fuse relational and semantic aspects.

However, the aforementioned models primarily focus on different feature aggregation strategies and do not propose effective measures to address the noise problem in the large number of neighboring nodes, which can result in suboptimal outcomes.

Large Neighbor Path Node Selection. Due to the complexity of the graph structure, how to select a more valuable subset of nodes for aggregation from the noise-filled large neighbor paths is a challenging task. To the best of the authors’ knowledge, there has been no dedicated research specifically addressing this issue, but many works have made valuable attempts at neighbor selection. GCN[19] as a seminal work in the field of graph neural networks, treats all the first-order neighbors as direct aggregation objects and extends the aggregation to higher orders by superimposing layers; GAT[20] uses an attention mechanism to dynamically select the neighbors of a node and assigns different weights based on the degree of interactions between the nodes to better capture the associative relationships between the nodes in the graph structure; GraphSAGE[21] uses random walks to generate node sequences from neighboring nodes for aggregation, avoiding the aggregation of all neighboring nodes to reduce the impact of noisy information. However, this method is inherently random; HetGNN[14] uses a heterogeneous neighbor sampling strategy based on restarting random walk (RWR) to collect all types of neighbors for each vertex, and then aggregates the information of different types of neighbor nodes in order to learn better node representations; RoHe[22] proposes a novel approach that employs an attention purification mechanism to shield against malicious neighbor nodes during adversarial attacks. However, it primarily targets malicious nodes in adversarial scenarios rather than addressing noisy information in large neighborhood paths. DCNN[23] employs the concept of diffusion kernels to obtain feature representations for each node based on the diffusion process to determine neighbor nodes; AGCN[24] performs neighborhood sampling of nodes at different scales through multi-scale neighborhood sampling; HetSANN[15] considers the structural relationships between nodes when selecting neighbor nodes and dynamically adjusts the selection strategy based on node types and edge types to ensure that the chosen neighbor nodes best reflect the structural relationships between nodes. While these works select neighbor sets from different perspectives, they do not address the noise problem in large neighborhood paths. Therefore, their selection strategies will be adversely affected when facing large neighbor paths with noise interference.

3 Preliminaries

Definition 1 (Heterogeneous Graph [10]).

A HIN can be represented as 𝒢={𝒱,,𝒜,}𝒢𝒱𝒜{\mathcal{G}=\{\mathcal{V},\mathcal{E},\mathcal{A},\mathcal{R}}\}caligraphic_G = { caligraphic_V , caligraphic_E , caligraphic_A , caligraphic_R }, consisting of the set of objects 𝒱𝒱{\mathcal{V}}caligraphic_V and the set of edges {\mathcal{E}}caligraphic_E as well as the set of object types 𝒜𝒜{\mathcal{A}}caligraphic_A and the set of edge types combined with {\mathcal{R}}caligraphic_R, where 𝒜={𝒜i|i1},={i|i1}formulae-sequence𝒜conditional-setsubscript𝒜𝑖𝑖1conditional-setsubscript𝑖𝑖1{\mathcal{A}=\{\mathcal{A}_{i}|i\geq 1\},\mathcal{R}=\{\mathcal{R}_{i}|i\geq 1\}}caligraphic_A = { caligraphic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_i ≥ 1 } , caligraphic_R = { caligraphic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_i ≥ 1 }, and |𝒜|+||>2𝒜2{|\mathcal{A}|+|\mathcal{R}|>2}| caligraphic_A | + | caligraphic_R | > 2.

Definition 2 (Meta-paths [10]).

A meta-path ΦΦ{\Phi}roman_Φ defined as a path connected by different objects and relations in the form of A1R1A2R2RlAl+1subscript𝑅1subscript𝐴1subscript𝐴2subscript𝑅2subscript𝑅𝑙subscript𝐴𝑙1A_{1}\xrightarrow{R_{1}}A_{2}\xrightarrow{R_{2}}\cdots\xrightarrow{R_{l}}A_{l+1}italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_ARROW start_OVERACCENT italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_OVERACCENT → end_ARROW italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_ARROW start_OVERACCENT italic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_OVERACCENT → end_ARROW ⋯ start_ARROW start_OVERACCENT italic_R start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_OVERACCENT → end_ARROW italic_A start_POSTSUBSCRIPT italic_l + 1 end_POSTSUBSCRIPT . A meta-path describes a composite relationship R=R1R2Rl𝑅subscript𝑅1subscript𝑅2subscript𝑅𝑙R=R_{1}\circ R_{2}\circ\cdots\circ R_{l}italic_R = italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∘ italic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∘ ⋯ ∘ italic_R start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT between node types A1subscript𝐴1{A_{1}}italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and Al+1subscript𝐴𝑙1{A_{l+1}}italic_A start_POSTSUBSCRIPT italic_l + 1 end_POSTSUBSCRIPT .

Definition 3 (Meta-path-based Neighbors [11]).

For a given node v and meta-path ΦΦ{\Phi}roman_Φ in a heterogeneous graph, a meta-path neighbor NvΦsuperscriptsubscript𝑁𝑣Φ{N_{v}^{\mathrm{\Phi}}}italic_N start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Φ end_POSTSUPERSCRIPT is defined as the set of nodes connected to node v via a meta-path ΦΦ{\Phi}roman_Φ.

Definition 4 (Meta-path-based Neighbors [22]).

The transit probability based on meta-path is defined as the probability from node v to node u along the meta-path ΦΦ{\Phi}roman_Φ, and the transit probability is a manifestation of connectivity within the meta-path.

4 Model

This section will introduce a new LSPI model. LSPI takes a given HIN 𝒢𝒢{\mathcal{G}}caligraphic_G and the attribute matrix XAi|𝒱Ai|×dAisubscript𝑋subscript𝐴𝑖superscriptsubscript𝒱subscript𝐴𝑖subscript𝑑subscript𝐴𝑖{X_{A_{i}}\in\mathbb{R}^{|\mathcal{V}_{A_{i}}|\times d_{A_{i}}}}italic_X start_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT | caligraphic_V start_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT | × italic_d start_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUPERSCRIPT of node type Ai𝒜subscript𝐴𝑖𝒜{A_{i}\in\mathcal{A}}italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_A as input, learning a mapping function f:𝒱d:𝑓𝒱superscript𝑑{f:\mathcal{V}\rightarrow\mathbb{R}^{d}}italic_f : caligraphic_V → blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT where d|𝒱|much-less-than𝑑𝒱{d\ll|\mathcal{V}|}italic_d ≪ | caligraphic_V |, to capture the rich structural and semantic information involved in 𝒢𝒢{\mathcal{G}}caligraphic_G, and generate the final feature representations for downstream tasks.

Specifically,as shown in Figure4, LSPI first uses a path discriminator to divide meta-paths according to the topology of the graph. The path discriminator divides meta-paths into LargePaths and SmallPaths by calculating the percentage change in degree values between different paths. For LargePaths, LSPI finds out the highest correlation nodes from many neighbor nodes from both topology and feature perspectives for aggregation in order to shield noise harassment. From a topological perspective, LSPI uses transition probability priors to calculate the transit probability of nodes. From a feature perspective, LSPI calculates the feature similarities of all nodes. Then LSPI selects the node with the highest transit probability and feature similarity from the large neighbor path for aggregation. For SmallPaths, LSPI uses convolution operation to capture feature information of specific subgraphs. Finally, LSPI employs graph-level attention to aggregate the features of all LargePaths with SmallPaths to obtain the final node embedding.

Refer to caption
Figure 4: LSPI framework structure.

4.1 Path Discriminator

Path Discriminator first calculates the total degree value of all nodes of each meta-path as shown in Equation 1:

DΦm=i=1|𝒱A|j=1|𝒱A|di,jsubscript𝐷subscriptΦ𝑚superscriptsubscript𝑖1subscript𝒱𝐴superscriptsubscript𝑗1subscript𝒱𝐴subscript𝑑𝑖𝑗D_{\Phi_{m}}=\sum_{i=1}^{\left|\mathcal{V}_{A}\right|}\sum_{j=1}^{\left|% \mathcal{V}_{A}\right|}d_{i,j}italic_D start_POSTSUBSCRIPT roman_Φ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | caligraphic_V start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT | end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | caligraphic_V start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT | end_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT (1)

In the above equation,DΦmsubscript𝐷subscriptΦ𝑚{D_{\Phi_{m}}}italic_D start_POSTSUBSCRIPT roman_Φ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT is the degree sum of all the nodes under the meta-path ΦmsubscriptΦ𝑚{\Phi_{m}}roman_Φ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, di,jsubscript𝑑𝑖𝑗{d_{i,j}}italic_d start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT denotes the value of the j𝑗{j}italic_j-th element of the i𝑖{i}italic_i-th row of ΦmsubscriptΦ𝑚{\Phi_{m}}roman_Φ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, and |𝒱A|subscript𝒱𝐴{\left|\mathcal{V}_{A}\right|}| caligraphic_V start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT | denotes the number of nodes of the target type.

After the above calculation, the degree value set DΦ={DΦi|i(1,,p)}subscript𝐷Φconditional-setsubscript𝐷subscriptΦ𝑖𝑖1𝑝{D_{\Phi}=\left\{D_{\Phi_{i}}|i\in\left(1,...,p\right)\right\}}italic_D start_POSTSUBSCRIPT roman_Φ end_POSTSUBSCRIPT = { italic_D start_POSTSUBSCRIPT roman_Φ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT | italic_i ∈ ( 1 , … , italic_p ) } of all meta-paths is obtained, where p is the number of meta-paths. Path Discriminator performs path delineation by calculating the relative differences between the degree values:

DΦmin=min(DΦ)subscript𝐷subscriptΦ𝑚𝑖𝑛𝑚𝑖𝑛subscript𝐷ΦD_{\Phi_{min}}=min(D_{\Phi})italic_D start_POSTSUBSCRIPT roman_Φ start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_m italic_i italic_n ( italic_D start_POSTSUBSCRIPT roman_Φ end_POSTSUBSCRIPT ) (2)
RΦi=DΦiDΦminDΦmin×100subscript𝑅subscriptΦ𝑖subscript𝐷subscriptΦ𝑖subscript𝐷subscriptΦ𝑚𝑖𝑛subscript𝐷subscriptΦ𝑚𝑖𝑛100R_{\Phi_{i}}=\frac{D_{\Phi_{i}}-D_{\Phi_{min}}}{D_{\Phi_{min}}}\times 100italic_R start_POSTSUBSCRIPT roman_Φ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT = divide start_ARG italic_D start_POSTSUBSCRIPT roman_Φ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_D start_POSTSUBSCRIPT roman_Φ start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG start_ARG italic_D start_POSTSUBSCRIPT roman_Φ start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG × 100 (3)

After obtaining the relative difference percentage between meta-paths RΦ={RΦi|i(1,,p)}subscript𝑅Φconditional-setsubscript𝑅subscriptΦ𝑖𝑖1𝑝{R_{\Phi}=\left\{R_{\Phi_{i}}|i\in\left(1,...,p\right)\right\}}italic_R start_POSTSUBSCRIPT roman_Φ end_POSTSUBSCRIPT = { italic_R start_POSTSUBSCRIPT roman_Φ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT | italic_i ∈ ( 1 , … , italic_p ) }, the Path Discriminator divides the meta-paths into large neighbor paths and small neighbor paths according to the relative difference values.

LargePaths={ΦbΦ|RΦbτ}𝐿𝑎𝑟𝑔𝑒𝑃𝑎𝑡𝑠conditional-setsubscriptΦ𝑏Φsubscript𝑅subscriptΦ𝑏𝜏LargePaths=\left\{\Phi_{b}\in\Phi|R_{\Phi_{b}}\geq\tau\right\}italic_L italic_a italic_r italic_g italic_e italic_P italic_a italic_t italic_h italic_s = { roman_Φ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ∈ roman_Φ | italic_R start_POSTSUBSCRIPT roman_Φ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_POSTSUBSCRIPT ≥ italic_τ } (4)
SmallPaths={ΦsΦ|RΦs<τ}𝑆𝑚𝑎𝑙𝑙𝑃𝑎𝑡𝑠conditional-setsubscriptΦ𝑠Φsubscript𝑅subscriptΦ𝑠𝜏SmallPaths=\left\{\Phi_{s}\in\Phi|R_{\Phi_{s}}<\tau\right\}italic_S italic_m italic_a italic_l italic_l italic_P italic_a italic_t italic_h italic_s = { roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∈ roman_Φ | italic_R start_POSTSUBSCRIPT roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT < italic_τ } (5)

Where τ𝜏{\tau}italic_τ is the set hyperparameter value, when τ𝜏{\tau}italic_τ takes 100% it means that the path degree value in the LargePaths set is at least twice the minimum degree value. In the experimental part, this paper will analyze the value of τ𝜏{\tau}italic_τ.

4.2 Large Neighbor Path Neighbor Node Selection

Due to the existence of a large number of neighbor nodes in the large neighbor path, as analyzed in the previous section, it is difficult to truly avoid noise interference when using attention aggregation. Convolution operations treat all nodes as equally important, which inevitably reduces the model’s accuracy in the presence of noise. Therefore, filtering out noisy nodes from multiple neighbors to improve the effectiveness of aggregation is an effective method to optimize the aggregation of large neighborhood paths. LSPI selects the neighbors with the highest topological relationships and feature similarity from both topological and feature perspectives for aggregation, in order to avoid noise interference.

At the topological level this paper first calculates the transit probability under meta-path , for relation Risubscript𝑅𝑖{R_{i}}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in ΦΦ{\Phi}roman_Φ there is a transit probability PRi=(DRi)1AdjRisuperscript𝑃subscript𝑅𝑖superscriptsuperscript𝐷subscript𝑅𝑖1𝐴𝑑superscript𝑗subscript𝑅𝑖{P^{R_{i}}=\left(D^{R_{i}}\right)^{-1}{Adj}^{R_{i}}}italic_P start_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT = ( italic_D start_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A italic_d italic_j start_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, where AdjRi𝐴𝑑superscript𝑗subscript𝑅𝑖{{Adj}^{R_{i}}}italic_A italic_d italic_j start_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and DRisuperscript𝐷subscript𝑅𝑖{D^{R_{i}}}italic_D start_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT denote the corresponding adjacency matrix and degree of the relation Risubscript𝑅𝑖{R_{i}}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT respectively matrix. It can be seen that PRisuperscript𝑃subscript𝑅𝑖{P^{R_{i}}}italic_P start_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is affected by the number of neighbors, so PvuRisuperscriptsubscript𝑃𝑣𝑢subscript𝑅𝑖{P_{vu}^{R_{i}}}italic_P start_POSTSUBSCRIPT italic_v italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT can be understood as the probability of passing through the relation Risubscript𝑅𝑖{R_{i}}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT from node v to node u. For the meta-path ΦΦ{\Phi}roman_Φ transit probability can be calculated as:

PΦ=PR1PR2PRlsuperscript𝑃Φsuperscript𝑃subscript𝑅1superscript𝑃subscript𝑅2superscript𝑃subscript𝑅𝑙P^{\Phi}=P^{R_{1}}P^{R_{2}}\cdots P^{R_{l}}italic_P start_POSTSUPERSCRIPT roman_Φ end_POSTSUPERSCRIPT = italic_P start_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_P start_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⋯ italic_P start_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_POSTSUPERSCRIPT (6)

At the feature level, based on the assumption that nodes with similar features are more important, this paper uses the calculation of feature similarity between node v𝑣{v}italic_v and node u𝑢{u}italic_u as the basis for feature-based judgment. Given the meta-path ΦΦ{\Phi}roman_Φ, for the target type node Vi𝒱Asubscript𝑉𝑖subscript𝒱𝐴{V_{i}\in\mathcal{V}_{A}}italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_V start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT has:

hi=hihisuperscriptsubscript𝑖subscript𝑖normsubscript𝑖h_{i}^{\prime}=\frac{h_{i}}{\|h_{i}\|}italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = divide start_ARG italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG ∥ italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ end_ARG (7)
svuΦ=hvhu+ϵsuperscriptsubscript𝑠𝑣𝑢Φsuperscriptsubscript𝑣superscriptsubscript𝑢italic-ϵs_{vu}^{\Phi}=h_{v}^{\prime}\cdot h_{u}^{\prime}+\epsilonitalic_s start_POSTSUBSCRIPT italic_v italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Φ end_POSTSUPERSCRIPT = italic_h start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⋅ italic_h start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + italic_ϵ (8)

where svuΦsuperscriptsubscript𝑠𝑣𝑢Φ{s_{vu}^{\Phi}}italic_s start_POSTSUBSCRIPT italic_v italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Φ end_POSTSUPERSCRIPT denotes the feature vector similarity between node v and node u after normalization under a given meta-path ΦΦ{\Phi}roman_Φ, hisubscript𝑖{h_{i}}italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the feature vector of node Visubscript𝑉𝑖{V_{i}}italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, and {\left\|\cdot\right\|}∥ ⋅ ∥ is the L2subscript𝐿2{L_{2}}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT paradigm function, ϵitalic-ϵ{\epsilon}italic_ϵ is a very small value, to avoid the adverse effect on the subsequent operations when the similarity is 0.

LSPI selects doubly better neighbor nodes in large neighbor paths by feature similarity and topological relationship.

tvuΦ=PvuΦsvuΦsuperscriptsubscript𝑡𝑣𝑢Φsuperscriptsubscript𝑃𝑣𝑢Φsuperscriptsubscript𝑠𝑣𝑢Φt_{vu}^{\Phi}=P_{vu}^{\Phi}\cdot s_{vu}^{\Phi}italic_t start_POSTSUBSCRIPT italic_v italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Φ end_POSTSUPERSCRIPT = italic_P start_POSTSUBSCRIPT italic_v italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Φ end_POSTSUPERSCRIPT ⋅ italic_s start_POSTSUBSCRIPT italic_v italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Φ end_POSTSUPERSCRIPT (9)

tvuΦtvΦsuperscriptsubscript𝑡𝑣𝑢Φsuperscriptsubscript𝑡𝑣Φ{t_{vu}^{\Phi}\in t_{v}^{\Phi}}italic_t start_POSTSUBSCRIPT italic_v italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Φ end_POSTSUPERSCRIPT ∈ italic_t start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Φ end_POSTSUPERSCRIPT is the topological and feature importance score of the neighbor u𝒩vΦ𝑢superscriptsubscript𝒩𝑣Φ{u\in\mathcal{N}_{v}^{\Phi}}italic_u ∈ caligraphic_N start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Φ end_POSTSUPERSCRIPT of the node v𝑣{v}italic_v. LSPI reconstructs the connectivity under the meta-paths based on the calculated importance score.

CvΦ=Select_Top(tvΦ,T)superscriptsubscript𝐶𝑣Φ𝑆𝑒𝑙𝑒𝑐𝑡_𝑇𝑜𝑝superscriptsubscript𝑡𝑣Φ𝑇C_{v}^{\Phi}=Select\_Top(t_{v}^{\Phi},T)italic_C start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Φ end_POSTSUPERSCRIPT = italic_S italic_e italic_l italic_e italic_c italic_t _ italic_T italic_o italic_p ( italic_t start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Φ end_POSTSUPERSCRIPT , italic_T ) (10)

CvΦsuperscriptsubscript𝐶𝑣Φ{C_{v}^{\Phi}}italic_C start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Φ end_POSTSUPERSCRIPT is the set of neighbors of node v𝑣{v}italic_v under meta-path ΦΦ{\Phi}roman_Φ with its neighbors u𝒩vΦ𝑢superscriptsubscript𝒩𝑣Φ{u\in\mathcal{N}_{v}^{\Phi}}italic_u ∈ caligraphic_N start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Φ end_POSTSUPERSCRIPT after similarity selection, Select_Top()𝑆𝑒𝑙𝑒𝑐𝑡_𝑇𝑜𝑝{Select\_Top(\cdot)}italic_S italic_e italic_l italic_e italic_c italic_t _ italic_T italic_o italic_p ( ⋅ ) function can select the top T𝑇{T}italic_T nodes with the highest value in the similarity vector tvΦsuperscriptsubscript𝑡𝑣Φ{t_{v}^{\Phi}}italic_t start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Φ end_POSTSUPERSCRIPT, T𝑇{T}italic_T is a set hyperparameter indicating the number of neighbor nodes retained under the meta-path ΦΦ{\Phi}roman_Φ.

4.3 Intra-path Aggregation

4.3.1 Node Feature Conversion

Considering that different nodes may be located in different feature spaces, we first project different types of nodes to the same dimension for ease of operation. For A𝒜𝐴𝒜{A\in\mathcal{A}}italic_A ∈ caligraphic_A type node u𝒱A𝑢subscript𝒱𝐴{u\in\mathcal{V}_{A}}italic_u ∈ caligraphic_V start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT, there are:

xu=WAxuAsuperscriptsubscript𝑥𝑢subscript𝑊𝐴superscriptsubscript𝑥𝑢𝐴x_{u}^{\prime}=W_{A}\cdot x_{u}^{A}italic_x start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_W start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ⋅ italic_x start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT (11)

where xuAdAsuperscriptsubscript𝑥𝑢𝐴superscriptsubscript𝑑𝐴{x_{u}^{A}\in\mathbb{R}^{d_{A}}}italic_x start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and xudsuperscriptsubscript𝑥𝑢superscriptsuperscript𝑑{x_{u}^{\prime}\in\mathbb{R}^{d^{\prime}}}italic_x start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT are the original features of node u and the projected features after feature conversion respectively. WAd×dAsubscript𝑊𝐴superscriptsuperscript𝑑subscript𝑑𝐴{W_{A}\in\mathbb{R}^{d^{\prime}\times d_{A}}}italic_W start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT × italic_d start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is the projected features for node type A𝐴{A}italic_A feature projection transformation matrix.

4.3.2 Large Neighbor Path Aggregation

After Section 4.2, LSPI then obtains the matrix of connectivity relationships under different meta-paths 𝒞=(CΦ1,CΦ2,CΦp)𝒞superscript𝐶subscriptΦ1superscript𝐶subscriptΦ2superscript𝐶subscriptΦ𝑝{\mathcal{C}=\left(C^{{\Phi}_{1}},C^{{\Phi}_{2}},...C^{{\Phi}_{p}}\right)}caligraphic_C = ( italic_C start_POSTSUPERSCRIPT roman_Φ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , italic_C start_POSTSUPERSCRIPT roman_Φ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , … italic_C start_POSTSUPERSCRIPT roman_Φ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ), and for the connectivity relations under different large neighbor paths ΦbLargePathssubscriptΦ𝑏𝐿𝑎𝑟𝑔𝑒𝑃𝑎𝑡𝑠{{\Phi}_{b}\in LargePaths}roman_Φ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ∈ italic_L italic_a italic_r italic_g italic_e italic_P italic_a italic_t italic_h italic_s LSPI performs the convolution operation on the subgraph to capture the similar node features after selection.

hu,Φb(l+1)=DΦb12AΦb^DΦb12hu,Φb(l)Wb(l)superscriptsubscript𝑢subscriptΦ𝑏𝑙1superscriptsubscript𝐷subscriptΦ𝑏12^subscript𝐴subscriptΦ𝑏superscriptsubscript𝐷subscriptΦ𝑏12superscriptsubscript𝑢subscriptΦ𝑏𝑙superscriptsubscript𝑊𝑏𝑙h_{u,\mathrm{\Phi}_{b}}^{\left(l+1\right)}=D_{{\Phi}_{b}}^{-\frac{1}{2}}% \widehat{A_{{\Phi}_{b}}}D_{{\Phi}_{b}}^{-\frac{1}{2}}h_{u,\mathrm{\Phi}_{b}}^{% \left(l\right)}W_{b}^{\left(l\right)}italic_h start_POSTSUBSCRIPT italic_u , roman_Φ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l + 1 ) end_POSTSUPERSCRIPT = italic_D start_POSTSUBSCRIPT roman_Φ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT over^ start_ARG italic_A start_POSTSUBSCRIPT roman_Φ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG italic_D start_POSTSUBSCRIPT roman_Φ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_h start_POSTSUBSCRIPT italic_u , roman_Φ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT italic_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT (12)

where hu,Φm(l+1)superscriptsubscript𝑢subscriptΦ𝑚𝑙1{h_{u,\mathrm{\Phi}_{m}}^{\left(l+1\right)}}italic_h start_POSTSUBSCRIPT italic_u , roman_Φ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l + 1 ) end_POSTSUPERSCRIPT is the feature representation of node u at the l+1𝑙1{l+1}italic_l + 1 layer of convolution, AΦm^d×d^subscript𝐴subscriptΦ𝑚superscriptsuperscript𝑑superscript𝑑{\widehat{A_{{\Phi}_{m}}}\in\mathbb{R}^{d^{\prime}\times d^{\prime}}}over^ start_ARG italic_A start_POSTSUBSCRIPT roman_Φ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG ∈ blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT × italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT is the adjacency matrix of ΦmsubscriptΦ𝑚{{\Phi}_{m}}roman_Φ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT after normalized adjacency matrix, DΦmsubscript𝐷subscriptΦ𝑚{D_{{\Phi}_{m}}}italic_D start_POSTSUBSCRIPT roman_Φ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT is the corresponding degree matrix, Wd×d𝑊superscriptsuperscript𝑑superscript𝑑{W\in\mathbb{R}^{d^{\prime}\times d^{\prime}}}italic_W ∈ blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT × italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT is the learnable parameter matrix, and hu,Φm(l)superscriptsubscript𝑢subscriptΦ𝑚𝑙{h_{u,\mathrm{\Phi}_{m}}^{\left(l\right)}}italic_h start_POSTSUBSCRIPT italic_u , roman_Φ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT is the feature of the node u at the convolution of the l𝑙{l}italic_l-th layer representation, where hu,Φm(0)superscriptsubscript𝑢subscriptΦ𝑚0{h_{u,\mathrm{\Phi}_{m}}^{\left(0\right)}}italic_h start_POSTSUBSCRIPT italic_u , roman_Φ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT is the initial node feature of node xusuperscriptsubscript𝑥𝑢{x_{u}^{\prime}}italic_x start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT.

4.3.3 Small Neighbor Path Aggregation

For small neighbor paths, this article constructs a subgraph based on the small neighbor paths and performs subgraph-level convolution aggregation on the subgraph. For the connection relationship under the small neighbor path ΦmSmallPathssubscriptΦ𝑚𝑆𝑚𝑎𝑙𝑙𝑃𝑎𝑡𝑠{{\Phi}_{m}\in SmallPaths}roman_Φ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ italic_S italic_m italic_a italic_l italic_l italic_P italic_a italic_t italic_h italic_s, LSPI performs a convolution operation on the subgraph to capture the node embedding under the small neighbor path.

hu,Φm(l+1)=DΦm12AΦm^DΦm12hu,Φm(l)Wm(l)superscriptsubscript𝑢subscriptΦ𝑚𝑙1superscriptsubscript𝐷subscriptΦ𝑚12^subscript𝐴subscriptΦ𝑚superscriptsubscript𝐷subscriptΦ𝑚12superscriptsubscript𝑢subscriptΦ𝑚𝑙superscriptsubscript𝑊𝑚𝑙h_{u,\mathrm{\Phi}_{m}}^{\left(l+1\right)}=D_{\mathrm{\Phi}_{m}}^{-\frac{1}{2}% }\widehat{A_{\mathrm{\Phi}_{m}}}D_{\mathrm{\Phi}_{m}}^{-\frac{1}{2}}h_{u,% \mathrm{\Phi}_{m}}^{\left(l\right)}W_{m}^{\left(l\right)}italic_h start_POSTSUBSCRIPT italic_u , roman_Φ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l + 1 ) end_POSTSUPERSCRIPT = italic_D start_POSTSUBSCRIPT roman_Φ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT over^ start_ARG italic_A start_POSTSUBSCRIPT roman_Φ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG italic_D start_POSTSUBSCRIPT roman_Φ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_h start_POSTSUBSCRIPT italic_u , roman_Φ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT italic_W start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT (13)

4.4 Subgraph-level Attention Aggregation

Since each meta-path contains different semantic information and specific meta-paths can only respond to node information from a single viewpoint, in order to learn more comprehensive node embeddings it is necessary to fuse different meta-paths to enrich the node feature representation. Taking the node embeddings H=HΦ1,,HΦp𝐻superscript𝐻subscriptΦ1superscript𝐻subscriptΦ𝑝{H={H^{\mathrm{\Phi}_{1}},...,H^{\mathrm{\Phi}_{p}}}}italic_H = italic_H start_POSTSUPERSCRIPT roman_Φ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , … , italic_H start_POSTSUPERSCRIPT roman_Φ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUPERSCRIPT as inputs, and LSPI employs subgraph-level attention to aggregate the embeddings learned from different meta-paths.

wi=1|𝒱|i𝒱qHTtanh(WHHiΦ+bH)subscript𝑤𝑖1𝒱subscript𝑖𝒱superscriptsubscript𝑞𝐻𝑇𝑡𝑎𝑛subscript𝑊𝐻superscriptsubscript𝐻𝑖Φsubscript𝑏𝐻w_{i}=\frac{1}{|\mathcal{V}|}\sum_{i\in\mathcal{V}}q_{H}^{T}\cdot tanh\left({W% _{H}}\cdot H_{i}^{\mathrm{\Phi}}+b_{H}\right)italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG | caligraphic_V | end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_V end_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ⋅ italic_t italic_a italic_n italic_h ( italic_W start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT ⋅ italic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Φ end_POSTSUPERSCRIPT + italic_b start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT ) (14)
βi=exp(wi)i=1Pexp(wi)subscript𝛽𝑖𝑒𝑥𝑝subscript𝑤𝑖superscriptsubscript𝑖1𝑃𝑒𝑥𝑝subscript𝑤𝑖\beta_{i}=\frac{exp\left(w_{i}\right)}{\sum_{i=1}^{P}exp\left(w_{i}\right)}italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG italic_e italic_x italic_p ( italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT italic_e italic_x italic_p ( italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG (15)
Z=i=1PβiHi𝑍superscriptsubscript𝑖1𝑃subscript𝛽𝑖subscript𝐻𝑖Z=\sum_{i=1}^{P}\beta_{i}H_{i}italic_Z = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (16)

Where WHsubscript𝑊𝐻{W_{H}}italic_W start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT is the weight matrix, bHsubscript𝑏𝐻{b_{H}}italic_b start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT is the bias vector, qz1×dsubscript𝑞𝑧superscript1superscript𝑑{q_{z}\in\mathbb{R}^{{1\times d}^{\prime}}}italic_q start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 1 × italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT is the graph level attention vector, βisubscript𝛽𝑖{\beta_{i}}italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the meta-paths ΦisubscriptΦ𝑖{\mathrm{\Phi}_{i}}roman_Φ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT contribution to a particular task. Z𝑍{Z}italic_Z is the final feature embedding after fusing different semantics.

For semi-supervised node classification task this paper uses cross entropy loss to optimize the model:

=lYLYvlln(Czvl)subscript𝑙subscript𝑌𝐿subscript𝑌subscript𝑣𝑙𝐶subscript𝑧subscript𝑣𝑙\mathcal{L}=-\sum_{l\in Y_{L}}Y_{v_{l}}\cdot\ln\left(C\cdot z_{v_{l}}\right)caligraphic_L = - ∑ start_POSTSUBSCRIPT italic_l ∈ italic_Y start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋅ roman_ln ( italic_C ⋅ italic_z start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) (17)

where Yvlsubscript𝑌subscript𝑣𝑙{Y_{v_{l}}}italic_Y start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_POSTSUBSCRIPT and zvlsubscript𝑧subscript𝑣𝑙{z_{v_{l}}}italic_z start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_POSTSUBSCRIPT are the label and embedding vectors of node vlsubscript𝑣𝑙{v_{l}}italic_v start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT, respectively, and C is the classifier parameter. The overall learning algorithm is outlined in Algorithm 1.

Algorithm 1 The overall learning algorithm of LSPI
1:Input:the heterogeneous graph 𝒢={𝒱,,𝒜,}𝒢𝒱𝒜{\mathcal{G}=\{\mathcal{V},\mathcal{E},\mathcal{A},\mathcal{R}}\}caligraphic_G = { caligraphic_V , caligraphic_E , caligraphic_A , caligraphic_R }, the initial node feature X𝑋{X}italic_X , region set R𝑅{R}italic_R, heterogeneous neighbor set ΦΦ{\Phi}roman_Φ, hyperparameter τ𝜏{\tau}italic_τ and T𝑇{T}italic_T, the LSPI model.
2:Output: The node Embeddings Z𝑍Zitalic_Z.
3:for Meta-path ΦiΦsubscriptΦ𝑖Φ{\mathrm{\Phi}_{i}\in\Phi}roman_Φ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ roman_Φ do
4:     Divide the large neighbor path and the small neighbor path by Eq.(1-4)
5:end for
6:for ΦbbigPathssubscriptΦ𝑏𝑏𝑖𝑔𝑃𝑎𝑡𝑠{\Phi_{b}\in bigPaths}roman_Φ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ∈ italic_b italic_i italic_g italic_P italic_a italic_t italic_h italic_s do
7:     Calculate the topological probability of the nodes in ΦbsubscriptΦ𝑏{\mathrm{\Phi}_{b}}roman_Φ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT by Eq.(6)
8:     Calculated feature similarity by Eq.(7,8)
9:     Select the neighbor node in ΦisubscriptΦ𝑖{\Phi_{i}}roman_Φ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT by Eq.(9)
10:     Reconstructs the connection relationship in the meta-path by Eq.(10)
11:end for
12:Feature transformation is performed by Eq.(11)
13:for ΦbbigPathssubscriptΦ𝑏𝑏𝑖𝑔𝑃𝑎𝑡𝑠{\Phi_{b}\in bigPaths}roman_Φ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ∈ italic_b italic_i italic_g italic_P italic_a italic_t italic_h italic_s do
14:     Compute node embeddings in relation matrix by Eq.(12)
15:end for
16:for ΦmSmallPathssubscriptΦ𝑚𝑆𝑚𝑎𝑙𝑙𝑃𝑎𝑡𝑠\Phi_{m}\in SmallPathsroman_Φ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ italic_S italic_m italic_a italic_l italic_l italic_P italic_a italic_t italic_h italic_s do
17:     Perform subgraph convolution operation by Eq.(13)
18:end for
19:Subgraph aggregation level attention, calculate the final embedding Z𝑍{Z}italic_Z by Eq.(14-16)
20:Backpropagation and update parameters according to Eq.(17)

5 Experiment

5.1 Datasets and Baselines

This paper selects three widely used heterogeneous graph datasets, ACM, IMDB, and Yelp, to evaluate LSPI. The ACM dataset contains a large number of academic papers covering a broad range of disciplines, with its main nodes consisting of papers (P), authors (A), and subjects (S). The IMDB dataset focuses on information from the movie and television industry, with its main nodes consisting of movies (M), directors (D), and actors (A). The Yelp dataset, a large-scale social media data source widely used in natural language processing and recommendation systems, includes user reviews and ratings of businesses, with its main nodes consisting of businesses (B), users (U), services (S), and rating levels (L). The average degree values and relative difference percentages of the meta-paths in the three datasets are shown in Table 1.

Table 1: Datasets information
Datasets Number of nodes Linkage Meta-path Meta-path average degree value RΦsubscript𝑅Φ{R_{\Phi}}italic_R start_POSTSUBSCRIPT roman_Φ end_POSTSUBSCRIPT      \bigstrut
ACM P:4019 A:7167 S:60 P-A P-S PAP 14.39(min) 0 \bigstrut
PSP 1079.42 7401.181 \bigstrut
PAPAP 1079.42 7401.181 \bigstrut
PSPSP 752.33 5128.145 \bigstrut
Others - - \bigstrut[t]
IMDB M:4278 D:2081 A:5257 M-A M-D MAM 19.95 390.172 \bigstrut[t]
MDM 4.07(min) 0 \bigstrut
MAMAM 280.2 6784.521 \bigstrut
MDMDM 4.07 0 \bigstrut
Others - - \bigstrut[t]
Yelp B:2614 U:1286 S:4 L:9 B-U B-S B-L BUB 202.11(min) 0 \bigstrut[t]
BSB 947.86 368.9822 \bigstrut
BLB 568.97 181.515 \bigstrut
BUBUB 1885.81 833.0612 \bigstrut
BSBSB 947.86 368.9822 \bigstrut
BLBLB 568.97 181.515 \bigstrut
Others - - \bigstrut[t]

In this paper, HAN (2019)[10], MAGNN(2020)[11], HGSL(2021)[25], HPN (2023)[12], ie-HGCN(2023)[26], and SR-HGNN(2023)[27] were selected as the baseline. The performance difference between LSPI and the selected baseline method was compared under different experiments.

5.2 Experimental Parameter Setup

This model sets the experimental parameters as follows: middle layer dimension d=64𝑑64{d=64}italic_d = 64, learning rate lr=0.005𝑙𝑟0.005{lr=0.005}italic_l italic_r = 0.005, optimizer choice Adam, weight decay is 6.0×1046.0superscript104{6.0\times{10}^{-4}}6.0 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT, feat drop id 0.5, maximum number of iterations is 1000, training and validation set is set to be the total dataset of 10% and 80% of the nodes are used for testing. In terms of meta-paths selection this paper takes into account both meta-paths with different number of neighbors, specifically the ACM dataset selects PAP, PSPSP, PAPAP (PSP is the same as PSPSP and therefore only one of them is selected); the IMDB dataset selects MAM, MAMAM, MDMDM (MDM is the same as MDMDM and therefore only one of them is selected); the Yelp dataset set chooses BUB, BUBUB, BSBSB, BLBLB (BSB is the same as BSBSB and BLB is the same as BLBLB, so only one of them is chosen, respectively).

In terms of large and small neighbor path division, due to the characteristics of different datasets, different parameters τ𝜏{\tau}italic_τ are set for different datasets. Specifically, for the ACM dataset, with τ=30𝜏30{\tau=30}italic_τ = 30, the large neighborhood paths identified by the path discriminator are PSPSP, PAPAP. For the IMDB dataset, with τ=200𝜏200{\tau=200}italic_τ = 200, the large neighborhood path identified is MAMAM. For the Yelp dataset, with τ=100𝜏100{\tau=100}italic_τ = 100, the large neighborhood paths identified are BUBUB, BSBSB. The large and small neighbor paths corresponding to different τ𝜏{\tau}italic_τ values are shown in Table II. The number of nodes selected from large neighborhood paths, T, is uniformly set to 500 for all datasets. In part 5.7, this article will study the impact of different τ𝜏{\tau}italic_τ and T values on the model. In part 5.8, this article will look at not setting the τ𝜏{\tau}italic_τ value and dividing all paths into large neighbor paths and all into small neighbor paths. Regarding the changes in model performance, this article will further study the selection of T value in Section 5.11. The big and small neighbor paths corresponding to different τ𝜏{\tau}italic_τ are shown in Table 2.

Table 2: Neighborhood path segmentation for different τ𝜏{\tau}italic_τ values of corresponding sizes
Datasets τ𝜏{\tau}italic_τ LargePaths SmallPaths \bigstrut
ACM 30 PAPAP, PSPSP PAP \bigstrut
100 PSPSP PAP, PAPAP \bigstrut
IMDB 100 MAM, MAMAM MDMDM \bigstrut
200 MAMAM MAM, MDMDM \bigstrut
Yelp 70 BUBUB, BSBSB, BLBLB BUB \bigstrut
100 BUBUB, BSBSB BUB, BLBLB \bigstrut

5.3 Classification Experiment

To evaluate the performance of the LSPI model in multi-label classification tasks, the target node features generated by each model were embedded into SVM classifiers with different training ratios and evaluated using Micro-F1 and Macro-F1. The experimental results are shown in Table 3.

Table 3: Classification experiment results
Datasets Metrics Split HAN MAGNN HGSL RoHe ie-HGCN HPN SR-HGNN LSPI \bigstrut
ACM Macro-F1 0.8 92.98 92.67 92.84 64.08 92.79 91.27 92.95 94.17 \bigstrut
0.6 92.91 92.18 92.75 93.38 92.59 91.24 92.98 93.89 \bigstrut
0.4 92.72 91.39 92.59 93.86 92.14 91.08 92.71 93.74 \bigstrut
0.2 92.44 90.02 92.41 92.85 91.35 90.95 92.33 93.59 \bigstrut
Micro-F1 0.8 92.9 92.61 92.75 91.91 92.73 91.21 92.85 94.05 \bigstrut
0.6 92.86 92.13 92.67 93.37 92.53 91.14 92.87 93.77 \bigstrut
0.4 92.67 91.38 92.53 93.32 92.11 90.98 92.62 93.62 \bigstrut
0.2 92.36 89.94 92.36 93.59 91.27 90.85 92.23 93.48 \bigstrut
IMDB Macro-F1 0.8 59.34 59.94 58.77 52.66 59.87 58.15 60.04 63.2 \bigstrut
0.6 59.93 59.72 58.21 55.51 59.65 58.48 59.89 62.22 \bigstrut
0.4 59.7 59.23 58.02 54.98 59.33 58.42 59.54 62.23 \bigstrut
0.2 59.65 57.87 58.15 55.47 58.24 58.13 58.91 61.1 \bigstrut
Micro-F1 0.8 59.54 60.06 59.11 55.25 59.82 58.38 60.19 63.51 \bigstrut
0.6 60.12 59.8 58.54 56.04 59.57 58.64 60 62.46 \bigstrut
0.4 59.86 59.29 58.48 55.64 59.26 58.61 59.67 62.48 \bigstrut
0.2 59.79 57.89 58.52 55.93 58.16 58.3 59.03 61.39 \bigstrut
Yelp Macro-F1 0.8 71.84 92.8 93.43 93.92 91.84 90.65 90.06 94.49 \bigstrut
0.6 70.27 92.64 94.41 93.13 91.83 90.41 89.88 94.08 \bigstrut
0.4 67.82 91.91 93.31 92.37 91.34 89.7 89.35 93.75 \bigstrut
0.2 63.45 90.98 93.05 92.13 91.2 88.9 88.83 93.15 \bigstrut
Micro-F1 0.8 81.73 91.73 92.75 94.53 90.97 90.43 90.2 93.87 \bigstrut
0.6 81.02 91.49 92.62 92.88 90.85 90.09 89.93 93.46 \bigstrut
0.4 80.23 90.72 92.5 92.55 90.49 89.42 89.51 93.01 \bigstrut
0.2 78.98 89.75 92.22 91.68 90.35 88.93 89.18 92.43 \bigstrut

As shown in Table 3, LSPI consistently outperforms baseline methods under different training ratios and achieves significant performance improvements. Specifically, at a training ratio of 80%, LSPI improves by 1.19% and 1.15% on the ACM dataset compared to HAN, by 3.86% and 3.97% on the IMDB dataset, and by 22.65% and 12.14% on the Yelp dataset. Compared to ie-HGCN, LSPI improves by 3.33% and 3.69% on the IMDB dataset and by 2.65% and 2.9% on the Yelp dataset. Compared to HPN, LSPI improves by 5.05% and 5.13% on the IMDB dataset and by 3.84% and 3.44% on the Yelp dataset. We attribute this to LSPI’s ability to effectively filter out noisy information before feature aggregation, resulting in higher aggregation quality.

5.4 Clustering Experiment

To divide the nodes in the dataset into different clusters or groups, we evaluate the performance of the LSPI model through a node clustering task. The learned node embeddings are used as input to the clustering model. We use the K-means algorithm for clustering and evaluate the clustering performance based on Normalized Mutual Information (NMI) and Adjusted Rand Index (ARI). The experimental results are shown in Table 4.

Table 4: Clustering experiment results
Datasets Metrics HAN MAGNN RoHe ie-HGCN SR-HGNN LSPI \bigstrut
ACM NMI 0.6995 0.7016 0.6756 0.4947 0.6952 0.7541 \bigstrut
ARI 0.7401 0.7214 0.6979 0.3489 0.7465 0.7883 \bigstrut
IMDB NMI 0.1201 0.1308 0.1131 0.1308 0.1371 0.1471 \bigstrut
ARI 0.1017 0.1276 0.1079 0.1304 0.1511 0.1612 \bigstrut
Yelp NMI 0.3986 0.4734 0.5335 0.1785 0.4998 0.6791 \bigstrut
ARI 0.4461 0.3823 0.4754 0.0639 0.5881 0.6994 \bigstrut

It can be seen from Table 4 that LSPI has achieved optimal results in clustering performance on different data sets, especially compared with ie-HGCN, it has achieved a 44% performance improvement on ACM’s ARI and a 26% performance improvement on NMI. The performance improvement reached 50.06% and 63.55% respectively on Yelp; compared with other models, it also achieved significant improvements on the three data sets, which further demonstrates the superiority of the LSPI method.

5.5 Visualization

In this section, t-distributed stochastic neighbor embedding (t-SNE) [28] will be used to map the node embedding of the ACM data set into a two-dimensional space, and three colors will be used to mark different nodes. The experimental results are shown in Figure5.

Refer to caption
Figure 5: Visualization experiments for node embedding on ACM datasets.

5.6 Ablation Study

In order to view the performance of different modules of LSPI, this paper deletes the large neighbor path aggregation module (LSPI-w/o-L) and the small neighbor path aggregation module (LSPI-w/o-S) to see the impact of different modules on model performance. After removing one of the modules, all meta-paths will be directly sent to another module for feature aggregation without going through the discriminator. For other settings, refer to Section 5.2.

Refer to caption
Figure 6: Classification and clustering results under ablation study.

As can be seen from Figure 6, the performance of LSPI-w/o-S after de-noising with the large neighbor path module in ACM and Yelp is significantly better than that of LSPI-w/o-L which directly aggregate neighbors based on meta paths, indicating that the large neighbor path aggregation module has achieved remarkable results in removing noise nodes. However, in terms of IMDB data set, the performance of LSPI-w/o-S decreases significantly compared with that of LSPI-w/o-L, indicating that LSPI-w/o-L module also plays a key role in improving model performance. At the same time, the accuracy of LSPI-W /o-B and LSPI-W /o-S on all data sets is lower than that of LSPI, which further proves that the large neighbor path module can effectively shield the noise information in the meta-path, but it is not conducive to preserving the original topological relationship of the HIN. Therefore, it is the best choice to keep both the large neighbor path module and the small neighbor path aggregation module.

5.7 Parameter Sensitivity Analysis

Furthermore, this paper examines the impact of two hyperparameters on the model’s performance: the parameter τ𝜏{\tau}italic_τ for the division of large and small neighborhood paths, and the number of neighbor nodes T𝑇{T}italic_T selected from the large neighborhood paths. τ𝜏{\tau}italic_τ determines the division between large and small neighborhood paths, while T𝑇{T}italic_T determines the final number of nodes retained in the large neighborhood paths. This paper investigates the effect of different values for these two parameters on model performance. For τ𝜏{\tau}italic_τ, various values are selected to divide the meta-paths into different categories, and the corresponding large and small neighborhood paths for different values are shown in Figure 7. For T𝑇{T}italic_T, the values {100,300,500,700,1000}1003005007001000{\left\{100,300,500,700,1000\right\}}{ 100 , 300 , 500 , 700 , 1000 } are set to observe the impact of different node counts on model accuracy. The experimental results are shown in Fig.6. In Section 5.11, this paper will further study the optimal value for T𝑇{T}italic_T and provide recommendations.

Refer to caption
Figure 7: Changes in model accuracy under different τ𝜏\tauitalic_τ values and varying numbers of neighbors T𝑇Titalic_T.

It can be seen from Fig.6 that the division of the path (τ𝜏{\tau}italic_τ) and the selection of the number of neighbor nodes (T𝑇{T}italic_T) have a great impact on the model performance. On the three data sets, the performance is best when the value of T𝑇{T}italic_T is distributed in the 300-500 range. This is because having too few neighbors reduces the model’s learning capacity, while having too many neighbors decreases performance due to the increase in noisy nodes. Therefore, the appropriate number of neighbor nodes has an important impact on the performance of the model. At the same time, it can be seen that the difference in the highest accuracy of the models on the ACM and Yelp data sets is not obvious under different τ𝜏{\tau}italic_τ values. We believes that this is because the nodes under the LargePaths of the two data sets have strong correlation and less noise information, so the difference in the highest accuracy of the model under different τ𝜏{\tau}italic_τ values is small. In addition, the IMDB data set performance is similar under different T𝑇{T}italic_T values. This is due to the small number of neighbors in the meta-path and the similar neighbor nodes retained under different T𝑇{T}italic_T values after feature and topology selection.

5.8 Big Neighborhood Path Performance Test

Furthermore, this paper examines the performance variations of the model under large neighborhood paths and small neighborhood paths. Specifically, the model variant LSPI-w/o-S is used as the experimental subject. Due to the inability to uniformly measure meta-paths with significant semantic differences, only meta-paths with similar semantics are selected as inputs to observe the performance variations of the model under large neighborhood paths. The selected meta-paths for the three datasets are PAP, PAPAP, MAM, MAMAM, and BUB, BUBUB, respectively. The detailed experimental results are shown in Table 5.

Table 5: Performance changes under large neighbor paths.
Dataset Meta-path HAN LSPI-w/o-S      \bigstrut
Macro-F1 Micro-F1 Macro-F1 Micro-F1 \bigstrut[t]
ACM PAP 92.29 92.24 91.06 91.09 \bigstrut[t]
PAPAP 90.24 90.29 91.98 92.02 \bigstrut[t]
Discrepancy -2.05 -1.95 0.92 0.93 \bigstrut
IMDB MAM 52.04 53.16 52.46 52.89 \bigstrut
MAMAM 49.4 50.79 52.14 53.02 \bigstrut
Discrepancy -2.64 -2.37 -0.32 0.13 \bigstrut
Yelp BUB 63.95 72.02 92.42 91.51 \bigstrut
BUBUB 77.83 74.86 92.44 91.59 \bigstrut
Discrepancy 13.88 2.84 0.02 0.08 \bigstrut

5.9 Robustness Study

Considering the removal of neighbor nodes in the large neighborhood path module, this section further verifies the robustness of the model. We randomly deleted 1/5,1/10,1/20,1/5015110120150{1/5,1/10,1/20,1/50}1 / 5 , 1 / 10 , 1 / 20 , 1 / 50 of the nodes and their adjacent edges in the ACM dataset to create four new datasets, denoted as ACM_5, ACM_10, ACM_20, and ACM_50. These datasets were then input into LSPI and HAN to test the robustness of the models The detailed experimental results are shown in Figure 8.

Refer to caption
Figure 8: Experimental results of LSPI and HAN on randomly deleting part of the node data set.

From Figure 8, it can be observed that the more nodes are deleted, the more significant the performance decline of the model. However, across all four experimental results, LSPI consistently outperforms HAN. Additionally, it can be seen that the rate of decline for LSPI is significantly lower than that of HAN, indicating that LSPI has stronger resistance to interference.

5.10 Node Number Research

To provide the optimal reference values for T under different datasets, this paper, based on Section 5.7, selects the highest accuracy value of τ𝜏{\tau}italic_τ, and further studies the model performance using four criteria(D_Max, D_Min, D_Avg, D_Med). These criteria represent the maximum degree, minimum degree, average degree, and median degree of all large neighborhood paths selected under the designated τ𝜏{\tau}italic_τ value. The specific information is shown in Table6. In the experiments, the decimal places of D_Avg will be rounded.

Table 6: The large neighborhood paths selected and their corresponding indicator values under different datasets.
Datasets τ𝜏{\tau}italic_τ Big_Path D_Max D_Min D_Avg D_Med \bigstrut
ACM 30 PAPAP 2595 2 1172.82 900 \bigstrut
PSPSP \bigstrut
IMDB 200 MAMAM 1365 1 280.2 190 \bigstrut
Yelp 100 BUBUB 3692 171 2833.68 2952 \bigstrut
BSBSB \bigstrut

The experimental results are shown in Table 7. Although the model accuracy under the four criteria did not exceed the highest value in Section 5.7, it can be observed that when T𝑇{T}italic_T is set to D_Avg and D_Med, the model achieves higher accuracy scores across all datasets. Therefore, this paper suggests that T𝑇{T}italic_T should lean towards the average degree and median degree of large neighborhood paths.

Table 7: Experimental results with different indicators.
Datasets Metric T=𝑇absent{T=}italic_T =D_Max T=𝑇absent{T=}italic_T =D_Min T=𝑇absent{T=}italic_T =D_Avg T=𝑇absent{T=}italic_T =D_Med \bigstrut
ACM Macro-F1 93.9 91.79 93.79 93.86 \bigstrut[t]
Micro-F1 93.82 91.75 93.65 93.73 \bigstrut[t]
IMDB Macro-F1 63.2 61.94 63.2 63.2 \bigstrut[t]
Micro-F1 63.51 62.13 63.51 63.51 \bigstrut[t]
Yelp Macro-F1 94.45 94.7 94.47 94.45 \bigstrut[t]
Micro-F1 93.77 94.14 93.79 93.77 \bigstrut[t]

6 Conclusion

This paper addresses the challenge of significant discrepancies in the number of neighbors across different meta-paths and the presence of noise in large neighborhood paths. To tackle these issues, a heterogeneous graph neural network algorithm based on the discrimination of large and small neighborhood paths is proposed, named LSPI. LSPI first divides meta-paths into large and small neighborhood paths using a path discriminator, and then selects nodes from the large neighborhood paths based on both topology and features to mitigate noise interference. Subsequently, feature information from different paths is obtained through a graph convolutional module and fused using a graph-level attention mechanism. Comprehensive experimental results demonstrate that LSPI exhibits favorable model performance and significant improvement in handling large neighborhood paths.

Acknowledgements

This work is supported by National Key R&D Program of China [2022ZD0119501]; NSFC [52374221]; Sci. & Tech. Development Fund of Shandong Province of China [ZR2022MF288, ZR2023MF097]; the Taishan Scholar Program of Shandong Province[ts20190936].

References

  • [1] X. Liang, Y. Ma, G. Cheng, C. Fan, Y. Yang, Z. Liu, Meta-path-based heterogeneous graph neural networks in academic network, International Journal of Machine Learning and Cybernetics (2022) 1–17.
  • [2] X. Chen, T. Tang, J. Ren, I. Lee, H. Chen, F. Xia, Heterogeneous graph learning for explainable recommendation over academic networks, in: IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, 2021, pp. 29–36.
  • [3] Z. Li, C. Lu, Y. Yi, J. Gong, A hierarchical framework for interactive behaviour prediction of heterogeneous traffic participants based on graph neural network, IEEE Transactions on Intelligent Transportation Systems 23 (7) (2021) 9102–9114.
  • [4] B. Hu, H. Wang, L. Wang, W. Yuan, Adverse drug reaction predictions using stacking deep heterogeneous information network embedding approach, Molecules 23 (12) (2018) 3193.
  • [5] W. Luo, H. Zhang, X. Yang, L. Bo, X. Yang, Z. Li, X. Qie, J. Ye, Dynamic heterogeneous graph neural network for real-time event prediction, in: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, 2020, pp. 3213–3223.
  • [6] J. Guo, L. Du, W. Bi, Q. Fu, X. Ma, X. Chen, S. Han, D. Zhang, Y. Zhang, Homophily-oriented heterogeneous graph rewiring, in: Proceedings of the ACM Web Conference 2023, 2023, pp. 511–522.
  • [7] Y. Liu, X. Ao, F. Feng, Y. Ma, K. Li, T.-S. Chua, Q. He, Flood: A flexible invariant learning framework for out-of-distribution generalization on graphs, in: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023, pp. 1548–1558.
  • [8] B. Lin, Y. Li, N. Gui, Z. Xu, Z. Yu, Multi-view graph representation learning beyond homophily, ACM Transactions on Knowledge Discovery from Data 17 (8) (2023) 1–21.
  • [9] X. Sun, H. Cheng, J. Li, B. Liu, J. Guan, All in one: Multi-task prompting for graph neural networks, in: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023, pp. 2120–2131.
  • [10] X. Wang, H. Ji, C. Shi, B. Wang, Y. Ye, P. Cui, P. S. Yu, Heterogeneous graph attention network, in: The world wide web conference, 2019, pp. 2022–2032.
  • [11] X. Fu, J. Zhang, Z. Meng, I. King, Magnn: Metapath aggregated graph neural network for heterogeneous graph embedding, in: Proceedings of the web conference 2020, 2020, pp. 2331–2341.
  • [12] H. Ji, X. Wang, C. Shi, B. Wang, P. S. Yu, Heterogeneous graph propagation network, IEEE Transactions on Knowledge and Data Engineering 35 (1) (2023) 521–532. doi:10.1109/TKDE.2021.3079239.
  • [13] Z. Hu, Y. Dong, K. Wang, Y. Sun, Heterogeneous graph transformer, in: Proceedings of the web conference 2020, 2020, pp. 2704–2710.
  • [14] C. Zhang, D. Song, C. Huang, A. Swami, N. V. Chawla, Heterogeneous graph neural network, in: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, 2019, pp. 793–803.
  • [15] H. Hong, H. Guo, Y. Lin, X. Yang, Z. Li, J. Ye, An attention-based graph neural network for heterogeneous structural learning, in: Proceedings of the AAAI conference on artificial intelligence, Vol. 34, 2020, pp. 4132–4139.
  • [16] X. Zhou, F. Shen, L. Liu, W. Liu, L. Nie, Y. Yang, H. T. Shen, Graph convolutional network hashing, IEEE transactions on cybernetics 50 (4) (2018) 1460–1472.
  • [17] C. Fu, G. Zheng, C. Huang, Y. Yu, J. Dong, Multiplex heterogeneous graph neural network with behavior pattern modeling, in: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023, pp. 482–494.
  • [18] Z. Wang, D. Yu, Q. Li, S. Shen, S. Yao, Sr-hgn: Semantic-and relation-aware heterogeneous graph neural network, Expert Systems with Applications 224 (2023) 119982.
  • [19] T. N. Kipf, M. Welling, Semi-supervised classification with graph convolutional networks, arXiv preprint arXiv:1609.02907 (2016).
  • [20] P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Lio, Y. Bengio, et al., Graph attention networks, stat 1050 (20) (2017) 10–48550.
  • [21] Y. Ding, X. Zhao, Z. Zhang, W. Cai, N. Yang, Graph sample and aggregate-attention network for hyperspectral image classification, IEEE Geoscience and Remote Sensing Letters 19 (2021) 1–5.
  • [22] M. Zhang, X. Wang, M. Zhu, C. Shi, Z. Zhang, J. Zhou, Robust heterogeneous graph neural networks against adversarial attacks, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36, 2022, pp. 4363–4370.
  • [23] J. Atwood, D. Towsley, Diffusion-convolutional neural networks, Advances in neural information processing systems 29 (2016).
  • [24] Z. Peng, H. Liu, Y. Jia, J. Hou, Attention-driven graph clustering network, in: Proceedings of the 29th ACM international conference on multimedia, 2021, pp. 935–943.
  • [25] J. Zhao, X. Wang, C. Shi, B. Hu, G. Song, Y. Ye, Heterogeneous graph structure learning for graph neural networks, in: Proceedings of the AAAI conference on artificial intelligence, Vol. 35, 2021, pp. 4697–4705.
  • [26] Y. Yang, Z. Guan, J. Li, W. Zhao, J. Cui, Q. Wang, Interpretable and efficient heterogeneous graph convolutional network, IEEE Transactions on Knowledge and Data Engineering 35 (2) (2023) 1637–1650. doi:10.1109/TKDE.2021.3101356.
  • [27] Z. Wang, D. Yu, Q. Li, S. Shen, S. Yao, Sr-hgn: Semantic-and relation-aware heterogeneous graph neural network, Expert Systems with Applications 224 (2023) 119982.
  • [28] L. Van der Maaten, G. Hinton, Visualizing data using t-sne., Journal of machine learning research 9 (11) (2008).