收据：精炼粗粒的独立任务，用于平行尖端的两分图分解

论文标题

收据：精炼粗粒的独立任务，用于平行尖端的两分图分解

RECEIPT: REfine CoarsE-grained IndePendent Tasks for Parallel Tip decomposition of Bipartite Graphs

论文作者

Lakhotia, Kartik, Kannan, Rajgopal, Prasanna, Viktor, De Rose, Cesar A. F.

论文摘要

尖端分解是两分之网中采矿密集子图的关键内核，并在垃圾邮件检测中应用，分析网络分析等。它创建了顶点诱导的亚级别的层次结构，其密度不同，由蝴蝶（2,2,2-二甲虫）的参与确定。为了构建层次结构，现有算法迭代地遵循删除更新（剥离）过程：删除蝴蝶数量最少的顶点，并相应地更新其2跳邻居的蝴蝶计数。需要探索2跳社区的需要在计算上呈现尖端分解非常昂贵。此外，仅剥离最小蝴蝶顶点的固有顺序使得衍生的平行算法容易发展，可容易同步。在本文中，我们提出了一种新型的平行尖端分解算法 - 精炼粗粒的独立任务（收据），通过将顶点分配到可以同时剥离的多个独立子集中，可以放松剥离顺序限制。这使得可以同时获得同步的高度并行性和急剧减少。此外，收据采用了混合剥离策略以及其他优化，从而大大减少了楔形探索和执行时间的量。我们对共享内存多功能服务器上的收据进行详细的实验评估。它可以处理一些比最先进的算法更快的公共两部分数据集数量级的订单 - 分别可减少1100倍和64倍的线程同步和穿越楔形质量。使用36个线程，收据最多可提供17.1倍的自我利用加速。我们的收据实施可从https://github.com/kartiklakhotia/receipt获得。

Tip decomposition is a crucial kernel for mining dense subgraphs in bipartite networks, with applications in spam detection, analysis of affiliation networks etc. It creates a hierarchy of vertex-induced subgraphs with varying densities determined by the participation of vertices in butterflies (2,2-bicliques). To build the hierarchy, existing algorithms iteratively follow a delete-update(peeling) process: deleting vertices with the minimum number of butterflies and correspondingly updating the butterfly count of their 2-hop neighbors. The need to explore 2-hop neighborhood renders tip-decomposition computationally very expensive. Furthermore, the inherent sequentiality in peeling only minimum butterfly vertices makes derived parallel algorithms prone to heavy synchronization. In this paper, we propose a novel parallel tip-decomposition algorithm -- REfine CoarsE-grained Independent Tasks (RECEIPT) that relaxes the peeling order restrictions by partitioning the vertices into multiple independent subsets that can be concurrently peeled. This enables RECEIPT to simultaneously achieve a high degree of parallelism and dramatic reduction in synchronizations. Further, RECEIPT employs a hybrid peeling strategy along with other optimizations that drastically reduce the amount of wedge exploration and execution time. We perform detailed experimental evaluation of RECEIPT on a shared-memory multicore server. It can process some of the largest publicly available bipartite datasets orders of magnitude faster than the state-of-the-art algorithms -- achieving up to 1100x and 64x reduction in the number of thread synchronizations and traversed wedges, respectively. Using 36 threads, RECEIPT can provide up to 17.1x self-relative speedup. Our implementation of RECEIPT is available at https://github.com/kartiklakhotia/RECEIPT.

下载PDF全文

下载文献需遵守相关版权规定

论文标题