论文标题
平行批处理最小跨越森林和动态聚集图聚类的效率
Parallel Batch-Dynamic Minimum Spanning Forest and the Efficiency of Dynamic Agglomerative Graph Clustering
论文作者
论文摘要
层次结构聚类(HAC)是一种流行的聚类数据算法,但是尽管它很重要,但对于具有良好理论保证的HAC尚无动态算法。在本文中,我们研究了边缘加权图上的动态HAC。随着单个链接HAC减少到计算最小跨越森林(MSF)时,我们的第一个结果是用于维护MSF的平行批处理算法。在一批$ k $ edge插入或删除上,我们的批次摩擦MSF算法以$ o(k \ log^6 n)$预期的摊销工作和$ o(\ log^4 n)$跨度运行,具有很高的可能性。这是第一个完全动态的MSF算法处理边缘更新的批次,每个更新和各个Polyrogarithmic跨度。使用我们的MSF算法,我们获得了一种平行批处理算法,该算法可以回答有关单链接图HAC簇的查询。 我们的第二个结果是,对于其他通用链接函数,动态图HAC更难。例如,假设有很强的指数时间假设,动态图HAC需要$ω(n^{1-o(1)})$每个更新的工作或在图形上查询$ n $ VERTICES在图形上进行$ n $ VERTICE,以进行完整的链接,加权平均链接和平均链接。对于完整的链接和加权平均链接,即使对于增量或减少算法,该界限仍然可以保持,即使我们允许$ \ permatatorName {poly}(n)$ - 近似值。对于平均链接,对于增量和减少算法,绑定弱至$ω(n^{1/2 -o(1)})$,当允许$ n^{o(1)} $近似值时,界限仍然保持。
Hierarchical agglomerative clustering (HAC) is a popular algorithm for clustering data, but despite its importance, no dynamic algorithms for HAC with good theoretical guarantees exist. In this paper, we study dynamic HAC on edge-weighted graphs. As single-linkage HAC reduces to computing a minimum spanning forest (MSF), our first result is a parallel batch-dynamic algorithm for maintaining MSFs. On a batch of $k$ edge insertions or deletions, our batch-dynamic MSF algorithm runs in $O(k\log^6 n)$ expected amortized work and $O(\log^4 n)$ span with high probability. It is the first fully dynamic MSF algorithm handling batches of edge updates with polylogarithmic work per update and polylogarithmic span. Using our MSF algorithm, we obtain a parallel batch-dynamic algorithm that can answer queries about single-linkage graph HAC clusters. Our second result is that dynamic graph HAC is significantly harder for other common linkage functions. For example, assuming the strong exponential time hypothesis, dynamic graph HAC requires $Ω(n^{1-o(1)})$ work per update or query on a graph with $n$ vertices for complete linkage, weighted average linkage, and average linkage. For complete linkage and weighted average linkage, the bound still holds even for incremental or decremental algorithms and even if we allow $\operatorname{poly}(n)$-approximation. For average linkage, the bound weakens to $Ω(n^{1/2 - o(1)})$ for incremental and decremental algorithms, and the bounds still hold when allowing $n^{o(1)}$-approximation.