通过指数机制和最大盖子差异私人$ k $ - 均值聚类

论文标题

通过指数机制和最大盖子差异私人$ k $ - 均值聚类

Differentially private $k$-means clustering via exponential mechanism and max cover

论文作者

Chaturvedi, Anamay, Nguyen, Huy, Xu, Eric

论文摘要

我们引入了一个新的$（ε_p，δ_p）$ - $ k $ - 均值聚类问题的私有算法。给定欧几里得空间中的数据集，$ k $ - 均值聚类问题需要一个在该空间中找到$ k $点，以使每个数据点与返回的$ k $之间的欧几里得距离之间的距离之和最接近。尽管存在具有良好理论保证的隐私方法来解决此问题[Balcan等，2017; Kaplan和Stemmer，2018年]，实际上可以看出，正是加性错误决定了这些方法的实际性能。通过将问题减少到网格上最大覆盖范围的一系列实例，我们能够得出一种新方法，该方法比以前的作品实现了较低的添加误差。 For input datasets with cardinality $n$ and diameter $Δ$, our algorithm has an $O(Δ^2 (k \log^2 n \log(1/δ_p)/ε_p + k\sqrt{d \log(1/δ_p)}/ε_p))$ additive error whilst maintaining constant multiplicative error.我们通过一些实验结束，并找到了对此问题先前实施的工作的改进。

We introduce a new $(ε_p, δ_p)$-differentially private algorithm for the $k$-means clustering problem. Given a dataset in Euclidean space, the $k$-means clustering problem requires one to find $k$ points in that space such that the sum of squares of Euclidean distances between each data point and its closest respective point among the $k$ returned is minimised. Although there exist privacy-preserving methods with good theoretical guarantees to solve this problem [Balcan et al., 2017; Kaplan and Stemmer, 2018], in practice it is seen that it is the additive error which dictates the practical performance of these methods. By reducing the problem to a sequence of instances of maximum coverage on a grid, we are able to derive a new method that achieves lower additive error then previous works. For input datasets with cardinality $n$ and diameter $Δ$, our algorithm has an $O(Δ^2 (k \log^2 n \log(1/δ_p)/ε_p + k\sqrt{d \log(1/δ_p)}/ε_p))$ additive error whilst maintaining constant multiplicative error. We conclude with some experiments and find an improvement over previously implemented work for this problem.

下载PDF全文

下载文献需遵守相关版权规定

论文标题