私人多赢家投票给机器学习

论文标题

私人多赢家投票给机器学习

Private Multi-Winner Voting for Machine Learning

论文作者

Dziedzic, Adam, Choquette-Choo, Christopher A, Dullerud, Natalie, Suriyakumar, Vinith Menon, Shamsabadi, Ali Shahin, Kaleem, Muhammad Ahmad, Jha, Somesh, Papernot, Nicolas, Wang, Xiao

论文摘要

私人多赢家投票是揭示满足有限差分隐私（DP）保证的$ k $ hot二进制向量的任务。尽管它在医疗保健等许多领域中，但该任务在机器学习文献中已被研究。我们提出了三种新的DP多赢家机制：二进制，$τ$和Powerset投票。二元投票通过组成独立运行。 $τ$投票范围在其$ \ ell_2 $ norm中最佳地投票，以限制数据独立的保证。 PowerSet投票通过将可能的结果视为功率集，可以在整个二进制矢量中运作。我们的理论和经验分析表明，除非标签之间存在牢固的相关性，否则二元投票可能是许多任务的竞争机制，在这种情况下，PowerSet投票表现优于它。我们使用我们的机制通过扩展规范的单标签技术：PATE来启用中央环境中的隐私多标签学习。我们发现，我们的技术在大型现实医疗保健数据和标准多标签基准测试方面的表现优于当前最新方法。我们进一步启用了多标签机密和私人协作（CAPC）学习，并表明在多站点设置中可以显着改善模型性能。

Private multi-winner voting is the task of revealing $k$-hot binary vectors satisfying a bounded differential privacy (DP) guarantee. This task has been understudied in machine learning literature despite its prevalence in many domains such as healthcare. We propose three new DP multi-winner mechanisms: Binary, $τ$, and Powerset voting. Binary voting operates independently per label through composition. $τ$ voting bounds votes optimally in their $\ell_2$ norm for tight data-independent guarantees. Powerset voting operates over the entire binary vector by viewing the possible outcomes as a power set. Our theoretical and empirical analysis shows that Binary voting can be a competitive mechanism on many tasks unless there are strong correlations between labels, in which case Powerset voting outperforms it. We use our mechanisms to enable privacy-preserving multi-label learning in the central setting by extending the canonical single-label technique: PATE. We find that our techniques outperform current state-of-the-art approaches on large, real-world healthcare data and standard multi-label benchmarks. We further enable multi-label confidential and private collaborative (CaPC) learning and show that model performance can be significantly improved in the multi-site setting.

下载PDF全文

下载文献需遵守相关版权规定

论文标题