论文标题
通过多臂强盗进行嘈杂查询的最佳聚类
Optimal Clustering with Noisy Queries via Multi-Armed Bandit
论文作者
论文摘要
在许多应用程序的动机上,我们以错误的甲骨文研究聚类。在此问题中,有$ n $的项目属于$ k $未知群集,允许该算法询问Oracle是否两个项目属于同一集群。但是,仅使用概率$ \ frac {1} {2}+\fracδ{2} $的概率$ \ frac {1} {2} $才能正确。目的是恢复最少数量嘈杂的查询的隐藏群集。以前的工作表明,问题可以用$ o(\ frac {nk \ log n} {δ^2} + \ text {poly}(k,k,\ frac {1}Δ,\ log n))$ queries $ queries,而$ω(\ frac {nk} {nk} {Δ^2} {Δ^2})因此,对于任何$ K $和$δ$的值,上限和下限之间仍然存在非平凡的差距。在这项工作中,我们获得了广泛参数的第一个匹配上限和下限。特别是,提出了一种新的多项式时间算法(\ frac {n(k + \ log n)} {δ^2} + \ text {poly}(k,k,\ frac {1}δ,\ log log n))$ queries $ queries。此外,我们证明了$ω(\ frac {n \ log n} {δ^2})$的新下限$,它与现有$ω(\ frac {nk} {nk} {Δ^2})$结合使用,绑定了我们的上层$ \ \ \ \ \ \ \ \ \ \ \ \ frog} nog prog> log frac n prog,为了获得新的结果,我们的主要成分是我们的问题与多臂强盗之间的有趣联系,这可能为其他类似问题提供有用的见解。
Motivated by many applications, we study clustering with a faulty oracle. In this problem, there are $n$ items belonging to $k$ unknown clusters, and the algorithm is allowed to ask the oracle whether two items belong to the same cluster or not. However, the answer from the oracle is correct only with probability $\frac{1}{2}+\fracδ{2}$. The goal is to recover the hidden clusters with minimum number of noisy queries. Previous works have shown that the problem can be solved with $O(\frac{nk\log n}{δ^2} + \text{poly}(k,\frac{1}δ, \log n))$ queries, while $Ω(\frac{nk}{δ^2})$ queries is known to be necessary. So, for any values of $k$ and $δ$, there is still a non-trivial gap between upper and lower bounds. In this work, we obtain the first matching upper and lower bounds for a wide range of parameters. In particular, a new polynomial time algorithm with $O(\frac{n(k+\log n)}{δ^2} + \text{poly}(k,\frac{1}δ, \log n))$ queries is proposed. Moreover, we prove a new lower bound of $Ω(\frac{n\log n}{δ^2})$, which, combined with the existing $Ω(\frac{nk}{δ^2})$ bound, matches our upper bound up to an additive $\text{poly}(k,\frac{1}δ,\log n)$ term. To obtain the new results, our main ingredient is an interesting connection between our problem and multi-armed bandit, which might provide useful insights for other similar problems.