论文标题

聚类排列:带有流应用程序的新技术

Clustering Permutations: New Techniques with Streaming Applications

论文作者

Chakraborty, Diptarka, Das, Debarati, Krauthgamer, Robert

论文摘要

我们在一组输入排名(即排列)上研究了经典的$ K $ -Median聚类问题,该排名具有无数应用程序,从社交选择理论到Web搜索和数据库。一种民间传说算法在多项式时间内为所有$ k = o(1)$提供了$ 2 $ - app的解决方案,而不论基本距离的措施如何工作,这是一个指标;但是,低于$ 2 $ factor是一个臭名昭著的挑战。我们考虑ULAM距离,ulam距离是众所周知的编辑距离度量标准的变体,其中弦乐被限制为排列。对于此指标,Chakraborty,Das和Krauthgamer [Soda,2021]为$ k = 1 $提供了$(2-δ)$ - 近似算法,其中$δ\ 2^{ - 40} $。 我们的主要贡献是用于聚集一组排列的新算法框架。我们的第一个结果是$ 1.999 $ -Approximation算法对于ULAM公制下的度量$ k $ -Median问题,该算法在时间$ $(k \ log(nd))^{o(k)} n d^3 $中,用于由$ N $组成的输入,该输入由$ n $ [d] $组成。实际上,我们的框架足够强大,可以将此结果扩展到仅使用polygarithmic(以$ n $)空间为单位的流型模型($ n $输入排列到一个)。此外,我们表明即使存在异常值,也可以获得相似的结果,这可能是一个更困难的问题。

We study the classical metric $k$-median clustering problem over a set of input rankings (i.e., permutations), which has myriad applications, from social-choice theory to web search and databases. A folklore algorithm provides a $2$-approximate solution in polynomial time for all $k=O(1)$, and works irrespective of the underlying distance measure, so long it is a metric; however, going below the $2$-factor is a notorious challenge. We consider the Ulam distance, a variant of the well-known edit-distance metric, where strings are restricted to be permutations. For this metric, Chakraborty, Das, and Krauthgamer [SODA, 2021] provided a $(2-δ)$-approximation algorithm for $k=1$, where $δ\approx 2^{-40}$. Our primary contribution is a new algorithmic framework for clustering a set of permutations. Our first result is a $1.999$-approximation algorithm for the metric $k$-median problem under the Ulam metric, that runs in time $(k \log (nd))^{O(k)}n d^3$ for an input consisting of $n$ permutations over $[d]$. In fact, our framework is powerful enough to extend this result to the streaming model (where the $n$ input permutations arrive one by one) using only polylogarithmic (in $n$) space. Additionally, we show that similar results can be obtained even in the presence of outliers, which is presumably a more difficult problem.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源