基于SI-Si-preading的网络嵌入静态和时间网络

论文标题

基于SI-Si-preading的网络嵌入静态和时间网络

SI-spreading-based network embedding in static and temporal networks

论文作者

Zhan, Xiu-Xiu, Li, Ziyu, Masuda, Naoki, Holme, Petter, Wang, Huijuan

论文摘要

链接预测可用于提取缺失的信息，识别虚假互动以及预测网络的演变。网络嵌入是一种方法，将坐标分配给低维矢量空间中的节点。通过将节点嵌入向量中，可以将链接预测问题转换为相似性比较任务。具有相似嵌入向量的节点更有可能连接。经典网络嵌入算法是基于随机步行的。他们通过随机步行采样轨迹路径，并从轨迹路径生成节点对。节点对集进一步用作Skip-gram模型的输入，Skip-gram模型是一种代表性的语言模型，将节点（被视为单词）嵌入向量中。在本研究中，我们建议通过扩散过程（即易感感染（SI）模型）替换随机步行过程，以替代样品路径。具体而言，我们分别提出了两种基于SI-Si-preading的算法，正弦和TSine分别嵌入静态和时间网络。与最新的静态和时间网络嵌入算法相比，通过缺少的链接预测任务评估我们的算法的性能。结果表明，正弦和TSINE在所有六个经验数据集中都优于基线。我们进一步发现，正弦的性能大多要比TSINE好，这表明时间信息不一定改善缺失链路预测的嵌入。此外，我们研究采样大小的影响，被定量为轨迹路径的总长度，对嵌入算法的性能。与基线算法相比，正弦和TSINE的表现更好，需要较小的采样尺寸。因此，基于SI扩展的嵌入往往更适用于大型网络。

Link prediction can be used to extract missing information, identify spurious interactions as well as forecast network evolution. Network embedding is a methodology to assign coordinates to nodes in a low dimensional vector space. By embedding nodes into vectors, the link prediction problem can be converted into a similarity comparison task. Nodes with similar embedding vectors are more likely to be connected. Classic network embedding algorithms are random-walk-based. They sample trajectory paths via random walks and generate node pairs from the trajectory paths. The node pair set is further used as the input for a Skip-Gram model, a representative language model that embeds nodes (which are regarded as words) into vectors. In the present study, we propose to replace random walk processes by a spreading process, namely the susceptible-infected (SI) model, to sample paths. Specifically, we propose two SI-spreading-based algorithms, SINE and TSINE, to embed static and temporal networks, respectively. The performance of our algorithms is evaluated by the missing link prediction task in comparison with state-of-the-art static and temporal network embedding algorithms. Results show that SINE and TSINE outperform the baselines across all six empirical datasets. We further find that the performance of SINE is mostly better than TSINE, suggesting that temporal information does not necessarily improve the embedding for missing link prediction. Moreover, we study the effect of the sampling size, quantified as the total length of the trajectory paths, on the performance of the embedding algorithms. The better performance of SINE and TSINE requires a smaller sampling size in comparison with the baseline algorithms. Hence, SI-spreading-based embedding tends to be more applicable to large-scale networks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题