推或拉：减少图表计算中的通信和同步

论文标题

推或拉：减少图表计算中的通信和同步

To Push or To Pull: On Reducing Communication and Synchronization in Graph Computations

论文作者

Besta, Maciej, Podstawski, Michal, Groner, Linus, Solomonik, Edgar, Hoefler, Torsten

论文摘要

我们通过分析处理图的最快方法来降低图形处理中的沟通成本和同步的成本：将更新推向共享状态或将更新提取到私人状态。我们调查了这种推杆二分法对各种算法的适用性及其对复杂性，性能，性能以及二手Locks，Atomks，Atomics，Atomics，atomics和Reads和Reads/reads/reads/reads/repress/repress/nertss/retantsss/ewners和reads/reads/repors and ands的影响。我们考虑11种图形算法，3种编程模型，2个图形抽象和各种图形系列。进行的分析说明了性能，收敛速度和代码复杂性的不同算法的推和拉变体之间的惊人差异。洞察力得到了硬件计数器的性能数据的支持。我们使用这些发现来说明每种算法的变体更快，并制定可实现更高加速的通用策略。我们的见解可用于加速图形处理引擎或库在大规模并行共享的内存机器以及分布式内存系统上。

We reduce the cost of communication and synchronization in graph processing by analyzing the fastest way to process graphs: pushing the updates to a shared state or pulling the updates to a private state.We investigate the applicability of this push-pull dichotomy to various algorithms and its impact on complexity, performance, and the amount of used locks, atomics, and reads/writes. We consider 11 graph algorithms, 3 programming models, 2 graph abstractions, and various families of graphs. The conducted analysis illustrates surprising differences between push and pull variants of different algorithms in performance, speed of convergence, and code complexity; the insights are backed up by performance data from hardware counters.We use these findings to illustrate which variant is faster for each algorithm and to develop generic strategies that enable even higher speedups. Our insights can be used to accelerate graph processing engines or libraries on both massively-parallel shared-memory machines as well as distributed-memory systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题