随机块模型的差异私人社区检测

论文标题

随机块模型的差异私人社区检测

Differentially Private Community Detection for Stochastic Block Models

论文作者

Seif, Mohamed, Nguyen, Dung, Vullikanti, Anil, Tandon, Ravi

论文摘要

鉴于用户之间的连通性（以图形的邻接矩阵表示），社区检测到图表的目的是恢复用户的基本标签/属性（例如，政治隶属关系）。当图形是从随机块模型（SBM）产生时，在理解社区检测的基本限制方面取得了重大进展。具体而言，已获得SBMS的尖锐信息理论限制和有效算法，该算法是$ p $和$ q $的函数，这代表了社区内部和社区内连接概率。在本文中，我们研究了社区检测问题，同时保留了顶点之间各个连接（边缘）的隐私。为了关注$（ε，δ）$ - 边缘差异隐私（DP）的概念，我们试图了解$（p，q）$，dp预算$（ε，δ）$的基本权衡以及社区标签精确恢复的计算效率。为此，我们介绍并分析了三个差异化私人社区恢复机制的相关信息理论权衡：a）基于稳定的机制； b）基于抽样的机制；和c）图形扰动机制。我们的主要发现是，稳定性和基于抽样的机制导致$（p，q）$和隐私预算$（ε，δ）$之间的卓越权衡；但是，这是以较高的计算复杂性为代价的。另一方面，尽管复杂性较低，但图形扰动机制要求隐私预算$ε$以缩放为$ω（\ log（n））$，以进行精确恢复。据我们所知，这是研究隐私限制对社区发现基本限制的影响的第一项工作。

The goal of community detection over graphs is to recover underlying labels/attributes of users (e.g., political affiliation) given the connectivity between users (represented by adjacency matrix of a graph). There has been significant recent progress on understanding the fundamental limits of community detection when the graph is generated from a stochastic block model (SBM). Specifically, sharp information theoretic limits and efficient algorithms have been obtained for SBMs as a function of $p$ and $q$, which represent the intra-community and inter-community connection probabilities. In this paper, we study the community detection problem while preserving the privacy of the individual connections (edges) between the vertices. Focusing on the notion of $(ε, δ)$-edge differential privacy (DP), we seek to understand the fundamental tradeoffs between $(p, q)$, DP budget $(ε, δ)$, and computational efficiency for exact recovery of the community labels. To this end, we present and analyze the associated information-theoretic tradeoffs for three broad classes of differentially private community recovery mechanisms: a) stability based mechanism; b) sampling based mechanisms; and c) graph perturbation mechanisms. Our main findings are that stability and sampling based mechanisms lead to a superior tradeoff between $(p,q)$ and the privacy budget $(ε, δ)$; however this comes at the expense of higher computational complexity. On the other hand, albeit low complexity, graph perturbation mechanisms require the privacy budget $ε$ to scale as $Ω(\log(n))$ for exact recovery. To the best of our knowledge, this is the first work to study the impact of privacy constraints on the fundamental limits for community detection.

下载PDF全文

下载文献需遵守相关版权规定

论文标题