精确的指数算法用于聚类问题

论文标题

精确的指数算法用于聚类问题

Exact Exponential Algorithms for Clustering Problems

论文作者

Fomin, Fedor V., Golovach, Petr A., Inamdar, Tanmay, Purohit, Nidhi, Saurabh, Saket

论文摘要

在本文中，我们启动了针对众所周知的聚类问题的精确算法的系统研究，即$ k $ -Median和$ k $ -Means。在$ k $ -Median中，输入由属于公制空间的$ n $点的集合组成，任务是选择一个子集$ c \ subseteq x $ $ k $点作为中心，以便将每个点的距离的距离的总和最小化。在$ k $ -Means中，目标是最大程度地减少距离的正方形之和。设计在时间上运行的算法$ \ max_ {k \ leq n} {n \ select k} n^{o（1）} = o^*（2^n）$（$ o^*（\ cdot）$ nides nide hides nides nides hide hide hide hide hide hide hide hide hide hide hide hide hives yide y y $ n $ n $ n $）。我们为这些问题设计了第一个非平凡的精确算法。特别是，我们获得了$ o^*（（1.89）^n）$ time精确算法，用于$ k $ -Median，可用于任何值的$ k $。我们的算法非常普遍，因为它不使用基础（度量）空间的任何属性 - 它甚至不需要距离来满足三角形不平等。特别是，同样的算法也适用于$ k $ -MEANS。我们通过证明算法的运行时间是渐近的最佳，直到指数的基础，我们可以补充这一结果。也就是说，除非ETH失败，否则在时间上运行的这些问题$ 2^{o（n）} \ cdot n^{o（1）} $没有算法。最后，我们考虑了这些聚类问题的“供应商”版本，除了集合$ x $，我们还可以给我们一组$ m $候选中心$ f $，目标是从$ f $中找到$ k $中心的子集。目标仍然是最大程度地减少$ k $ -Median/$ k $ -means/$ k $ - 中心目标。对于这些版本，我们使用子集卷积给出了$ O（2^n（mn）^{o（1）}）$ time算法。我们通过表明在集合封面的猜想下，这些问题的供应商版本不接受准确的算法$ 2^{（1-ε）n}（mn）^{o（1）} $来补充这一结果。

In this paper we initiate a systematic study of exact algorithms for well-known clustering problems, namely $k$-Median and $k$-Means. In $k$-Median, the input consists of a set $X$ of $n$ points belonging to a metric space, and the task is to select a subset $C \subseteq X$ of $k$ points as centers, such that the sum of the distances of every point to its nearest center is minimized. In $k$-Means, the objective is to minimize the sum of squares of the distances instead. It is easy to design an algorithm running in time $\max_{k\leq n} {n \choose k} n^{O(1)} = O^*(2^n)$ ($O^*(\cdot)$ notation hides polynomial factors in $n$). We design first non-trivial exact algorithms for these problems. In particular, we obtain an $O^*((1.89)^n)$ time exact algorithm for $k$-Median that works for any value of $k$. Our algorithm is quite general in that it does not use any properties of the underlying (metric) space -- it does not even require the distances to satisfy the triangle inequality. In particular, the same algorithm also works for $k$-Means. We complement this result by showing that the running time of our algorithm is asymptotically optimal, up to the base of the exponent. That is, unless ETH fails, there is no algorithm for these problems running in time $2^{o(n)} \cdot n^{O(1)}$. Finally, we consider the "supplier" versions of these clustering problems, where, in addition to the set $X$ we are additionally given a set of $m$ candidate centers $F$, and objective is to find a subset of $k$ centers from $F$. The goal is still to minimize the $k$-Median/$k$-Means/$k$-Center objective. For these versions we give a $O(2^n (mn)^{O(1)})$ time algorithms using subset convolution. We complement this result by showing that, under the Set Cover Conjecture, the supplier versions of these problems do not admit an exact algorithm running in time $2^{(1-ε) n} (mn)^{O(1)}$.

下载PDF全文

下载文献需遵守相关版权规定

论文标题