论文标题
快速有效的平行广度优先搜索,并使用幂律图形转换
Fast and Efficient Parallel Breadth-First Search with Power-law Graph Transformation
论文作者
论文摘要
在大数据时代,在各种情况下,例如社交网络,知识图,网络搜索和推荐系统,图表计算被广泛用于利用现实图形中的隐藏值。但是,随机内存访问导致缓存效率低下,并且不规则的度分布会导致大量负载不平衡。广度优先的搜索(BFS)经常用于许多重要且复杂的图算法的内核。在本文中,我们描述了使用反向Cuthill-Mckee(RCM)算法来改善数据局部性的预处理方法,并演示了如何实现BFS有效的负载平衡。通过SIMD执行,对RCM重新计算的图形数据的计算也会加速。我们评估了图500基准和现实图形图的Kronecker图上图形预处理方法的性能。我们在RCM重新计算图数据上实现的BFS实现在ARMV8系统上达到326.48 mTEPS/W(每瓦的Mega Teps),在2020年6月在Green Graph500列表中排名第二(第1级等级使用GPU加速度)。
In the big data era, graph computing is widely used to exploit the hidden value in real-world graphs in various scenarios such as social networks, knowledge graphs, web searching, and recommendation systems. However, the random memory accesses result in inefficient use of cache and the irregular degree distribution leads to substantial load imbalance. Breadth-First Search (BFS) is frequently utilized as a kernel for many important and complex graph algorithms. In this paper, we describe a preprocessing approach using Reverse Cuthill-Mckee (RCM) algorithm to improve data locality and demonstrate how to achieve an efficient load balancing for BFS. Computations on RCM-reordered graph data are also accelerated with SIMD executions. We evaluate the performance of the graph preprocessing approach on Kronecker graphs of the Graph500 benchmark and real-world graphs. Our BFS implementation on RCM-reordered graph data achieves 326.48 MTEPS/W (mega TEPS per watt) on an ARMv8 system, ranking 2nd on the Green Graph500 list in June 2020 (the 1st rank uses GPU acceleration).