论文标题

在时间图流中进行近似基序计数的有效采样算法

Efficient Sampling Algorithms for Approximate Motif Counting in Temporal Graph Streams

论文作者

Wang, Jingjing, Wang, Yanhao, Jiang, Wenjun, Li, Yuchen, Tan, Kian-Lee

论文摘要

从通信网络中的用户交互到金融市场的交易,可以将各种各样的复杂系统建模为由一组顶点和一系列时间戳和有向的边缘组成的时间图。时间基序是从静态图中的子图模式中概括的,这些图案除了拓扑外考虑边缘顺序和持续时间。计算时间基序的发生数量是时间网络分析的基本问题。但是,现有方法要么不能支持时间主题或遭受性能问题的困扰。此外,它们无法在流媒体模型中工作,其中随着时间的推移会逐步观察边缘。在本文中,我们专注于通过随机抽样进行近似时间基序计数。我们首先提出了两个在离线设置中计数时间基础计数的采样算法。第一个是用于估计任何时间基序的实例数量的边缘采样(ES)算法。第二个是一种改进的边缘 - 宽样采样(EWS)算法,该算法将边缘采样与楔形采样杂交,以计算$ 3 $顶点的时间图案和$ 3 $ edges。此外,我们通过扩展称为SES和SEWS的ES和EWS算法,提出两种算法在时间图流中逐渐计数的时间基序。我们提供了我们提出算法的理论界限和复杂性的全面分析。最后,我们对几个现实世界图表上提出的算法进行了广泛的实验评估。结果表明,ES和EW具有更高的效率,更高的准确性和更高的可扩展性,而在离线设置中的时间基序进行了最新的采样方法。此外,SES和SEWS在ES和EWS上最多达到了三个数量级的加速,同时在流式设置中对时间基序进行了可比的估计误差。

A great variety of complex systems, from user interactions in communication networks to transactions in financial markets, can be modeled as temporal graphs consisting of a set of vertices and a series of timestamped and directed edges. Temporal motifs are generalized from subgraph patterns in static graphs which consider edge orderings and durations in addition to topologies. Counting the number of occurrences of temporal motifs is a fundamental problem for temporal network analysis. However, existing methods either cannot support temporal motifs or suffer from performance issues. Moreover, they cannot work in the streaming model where edges are observed incrementally over time. In this paper, we focus on approximate temporal motif counting via random sampling. We first propose two sampling algorithms for temporal motif counting in the offline setting. The first is an edge sampling (ES) algorithm for estimating the number of instances of any temporal motif. The second is an improved edge-wedge sampling (EWS) algorithm that hybridizes edge sampling with wedge sampling for counting temporal motifs with $3$ vertices and $3$ edges. Furthermore, we propose two algorithms to count temporal motifs incrementally in temporal graph streams by extending the ES and EWS algorithms referred to as SES and SEWS. We provide comprehensive analyses of the theoretical bounds and complexities of our proposed algorithms. Finally, we perform extensive experimental evaluations of our proposed algorithms on several real-world temporal graphs. The results show that ES and EWS have higher efficiency, better accuracy, and greater scalability than state-of-the-art sampling methods for temporal motif counting in the offline setting. Moreover, SES and SEWS achieve up to three orders of magnitude speedups over ES and EWS while having comparable estimation errors for temporal motif counting in the streaming setting.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源