论文标题
Graphzeppelin:在动态图流上的连接组件的存储友好素描
GraphZeppelin: Storage-Friendly Sketching for Connected Components on Dynamic Graph Streams
论文作者
论文摘要
查找图的连接组件是整个计算机科学和工程中用途的基本问题。当图形非常大,或者当动态时,计算连接组件的任务变得更加困难,这意味着边缘集随时间变化而变化。一种自然的方法来计算大型动态图表上连接的组件是购买足够的RAM来存储整个图。但是,对于非常大的图,对RAM中的图形拟合的要求是过分的。因此,对于可以处理密集动态图的系统的需求未满足,尤其是当这些图形大于可用RAM时。 我们提出了一个新的高性能流式图形处理系统,用于计算图形的连接组件。我们称之为GraphZepelin的系统使用新的线性素描数据结构(Cubesketches)来求解流媒体连接的组件问题,因此需要渐近空间小于图形无损表示所需的空间。 Graphzeppelin用于大量密集图:Graphzeppelin每秒可以处理数百万的边缘更新(插入和删除),即使基础图太大而无法适合可用的RAM。结果,Graphzepelin大大增加了可以处理的图表的比例。
Finding the connected components of a graph is a fundamental problem with uses throughout computer science and engineering. The task of computing connected components becomes more difficult when graphs are very large, or when they are dynamic, meaning the edge set changes over time subject to a stream of edge insertions and deletions. A natural approach to computing the connected components on a large, dynamic graph stream is to buy enough RAM to store the entire graph. However, the requirement that the graph fit in RAM is prohibitive for very large graphs. Thus, there is an unmet need for systems that can process dense dynamic graphs, especially when those graphs are larger than available RAM. We present a new high-performance streaming graph-processing system for computing the connected components of a graph. This system, which we call GraphZeppelin, uses new linear sketching data structures (CubeSketches) to solve the streaming connected components problem and as a result requires space asymptotically smaller than the space required for a lossless representation of the graph. GraphZeppelin is optimized for massive dense graphs: GraphZeppelin can process millions of edge updates (both insertions and deletions) per second, even when the underlying graph is far too large to fit in available RAM. As a result GraphZeppelin vastly increases the scale of graphs that can be processed.