减轻邻居的偏见：增强图形自pervise以结构等效的阳性样本

论文标题

减轻邻居的偏见：增强图形自pervise以结构等效的阳性样本

Alleviating neighbor bias: augmenting graph self-supervise learning with structural equivalent positive samples

论文作者

Zhu, Jiawei, Hong, Mei, Du, Ronghua, Li, Haifeng

论文摘要

近年来，使用自我监督的学习框架来学习图形的一般特征被认为是图表表示学习的有希望的范式。图形神经网络的自学学习策略的核心在于构建合适的积极样本选择策略。但是，现有的GNN通常从相邻节点汇总信息以更新节点表示形式，从而导致过度依赖相邻的阳性样本，即同质样本；虽然忽略了远程阳性样品，即图表上却相距遥远但在结构等效的样本上，我们称之为“邻居偏见”。该邻居偏见可以降低GNN的概括性能。在本文中，我们认为GNN的泛化特性应通过结合均匀样本和结构等效样本来确定，我们称之为“ GC组合假说”。因此，我们提出了一种拓扑信号驱动的自我监督方法。它使用拓扑信息引导的结构等效采样策略。首先，我们使用持续的同源性提取多尺度拓扑特征。然后，我们根据节点对的拓扑特征来计算节点对的结构当量。特别是，我们设计了一种拓扑损失函数，以拉动在表示空间中具有高结构对等的非邻居节点对，以减轻邻居偏见。最后，我们使用联合训练机制来调整结构对等对模型的影响，以适合具有不同特征的数据集。我们在七个图形数据集上进行了有关节点分类任务的实验。结果表明，使用拓扑信号增强策略可以有效地提高模型性能。

In recent years, using a self-supervised learning framework to learn the general characteristics of graphs has been considered a promising paradigm for graph representation learning. The core of self-supervised learning strategies for graph neural networks lies in constructing suitable positive sample selection strategies. However, existing GNNs typically aggregate information from neighboring nodes to update node representations, leading to an over-reliance on neighboring positive samples, i.e., homophilous samples; while ignoring long-range positive samples, i.e., positive samples that are far apart on the graph but structurally equivalent samples, a problem we call "neighbor bias." This neighbor bias can reduce the generalization performance of GNNs. In this paper, we argue that the generalization properties of GNNs should be determined by combining homogeneous samples and structurally equivalent samples, which we call the "GC combination hypothesis." Therefore, we propose a topological signal-driven self-supervised method. It uses a topological information-guided structural equivalence sampling strategy. First, we extract multiscale topological features using persistent homology. Then we compute the structural equivalence of node pairs based on their topological features. In particular, we design a topological loss function to pull in non-neighboring node pairs with high structural equivalence in the representation space to alleviate neighbor bias. Finally, we use the joint training mechanism to adjust the effect of structural equivalence on the model to fit datasets with different characteristics. We conducted experiments on the node classification task across seven graph datasets. The results show that the model performance can be effectively improved using a strategy of topological signal enhancement.

下载PDF全文

下载文献需遵守相关版权规定

论文标题