论文标题

当隐私符合部分信息时:对差异私人土匪的精致分析

When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits

论文作者

Azize, Achraf, Basu, Debabrota

论文摘要

我们研究具有$ε$ - 全球差异隐私(DP)的多军匪徒的问题。首先,我们证明了使用$ε$ -Global DP量化匪徒硬度的随机和线性土匪的最小值和问题依赖的后悔下限。这些界限表明存在两个硬度制度,具体取决于隐私预算$ε$。在高私人制度(小$ε$)中,硬度取决于隐私效果和有关奖励分布的部分信息。在低私人制度(大$ε$)中,具有$ε$ -Global DP的土匪并不比没有隐私的土匪更难。对于随机匪徒,我们进一步提出了一个通用框架,以设计基于索引的乐观强盗算法的近乎最佳的$ε$全局DP扩展。该框架由三种成分组成:拉普拉斯机制,扶手依赖性自适应发作以及仅在最后一集中收集的奖励来计算私人统计数据中收集的奖励。具体而言,我们实例化UCB和KL-UCB算法的$ε$ -Global DP扩展,即ADAP-UCB和ADAP-KLUCB。 Adap-klucb是两者都满足$ε$ -Global DP的第一个算法,并产生了遗憾的上限,与问题依赖性的下限与乘法常数相匹配。

We study the problem of multi-armed bandits with $ε$-global Differential Privacy (DP). First, we prove the minimax and problem-dependent regret lower bounds for stochastic and linear bandits that quantify the hardness of bandits with $ε$-global DP. These bounds suggest the existence of two hardness regimes depending on the privacy budget $ε$. In the high-privacy regime (small $ε$), the hardness depends on a coupled effect of privacy and partial information about the reward distributions. In the low-privacy regime (large $ε$), bandits with $ε$-global DP are not harder than the bandits without privacy. For stochastic bandits, we further propose a generic framework to design a near-optimal $ε$ global DP extension of an index-based optimistic bandit algorithm. The framework consists of three ingredients: the Laplace mechanism, arm-dependent adaptive episodes, and usage of only the rewards collected in the last episode for computing private statistics. Specifically, we instantiate $ε$-global DP extensions of UCB and KL-UCB algorithms, namely AdaP-UCB and AdaP-KLUCB. AdaP-KLUCB is the first algorithm that both satisfies $ε$-global DP and yields a regret upper bound that matches the problem-dependent lower bound up to multiplicative constants.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源