用于K群集问题的滑动窗口算法

论文标题

用于K群集问题的滑动窗口算法

Sliding Window Algorithms for k-Clustering Problems

论文作者

Borassi, Michele, Epasto, Alessandro, Lattanzi, Silvio, Vassilvitskii, Sergei, Zadimoghaddam, Morteza

论文摘要

计算的滑动窗口模型捕获了数据连续到达的方案，但只能将最新的$ W $元素用于分析。目的是设计算法，可以在每个到达时有效地更新解决方案，而不是从头开始重新计算它。在这项工作中，我们专注于$ k $ - 群集问题，例如$ k $ -Means和$ k $ -Median。在这种情况下，我们提供了简单且实用的算法，可提供比以前的结果更强的性能保证。从经验上讲，我们表明我们的方法仅存储数据的一小部分，是更快的数量级，并且找到的解决方案仅比算法返回的解决方案略高，而算法访问了整个数据集。

The sliding window model of computation captures scenarios in which data is arriving continuously, but only the latest $w$ elements should be used for analysis. The goal is to design algorithms that update the solution efficiently with each arrival rather than recomputing it from scratch. In this work, we focus on $k$-clustering problems such as $k$-means and $k$-median. In this setting, we provide simple and practical algorithms that offer stronger performance guarantees than previous results. Empirically, we show that our methods store only a small fraction of the data, are orders of magnitude faster, and find solutions with costs only slightly higher than those returned by algorithms with access to the full dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题