会员资格：对机器学习模型的会员推断的全面评估

论文标题

会员资格：对机器学习模型的会员推断的全面评估

Membership-Doctor: Comprehensive Assessment of Membership Inference Against Machine Learning Models

论文作者

He, Xinlei, Li, Zheng, Xu, Weilin, Cornelius, Cory, Zhang, Yang

论文摘要

机器学习模型容易记住敏感数据，使它们容易受到会员推理攻击的攻击，其中对手的目的是推断是否使用输入样本来训练该模型。在过去的几年中，研究人员产生了许多会员推理攻击和防御。但是，这些攻击和防御采用了各种策略，并在不同的模型和数据集中进行。但是，缺乏全面的基准意味着我们不了解现有攻击和防御的优势和劣势。我们通过对不同的会员推理攻击和防御措施进行大规模测量来填补这一空白。我们通过研究九项攻击和六项防御措施来系统化成员的推断，并在整体评估中衡量不同攻击和防御的性能。然后，我们量化威胁模型对这些攻击结果的影响。我们发现，威胁模型的某些假设，例如相同架构和阴影和目标模型之间的相同分布是不必要的。我们也是第一个执行对从Internet收集的现实世界数据而不是实验室数据集进行攻击的人。我们进一步研究是什么决定了会员推理攻击的表现，并揭示了通常认为过度拟合的水平不足以获得攻击的成功。取而代之的是，成员和非成员样本之间的熵/跨熵的Jensen-Shannon距离与攻击性能的相关性更好。这为我们提供了一种新的方法，可以准确预测会员推理风险而无需进行攻击。最后，我们发现数据增强在更大程度上降低了现有攻击的性能，我们提出了使用增强作用的自适应攻击来训练阴影和攻击模型，以改善攻击性能。

Machine learning models are prone to memorizing sensitive data, making them vulnerable to membership inference attacks in which an adversary aims to infer whether an input sample was used to train the model. Over the past few years, researchers have produced many membership inference attacks and defenses. However, these attacks and defenses employ a variety of strategies and are conducted in different models and datasets. The lack of comprehensive benchmark, however, means we do not understand the strengths and weaknesses of existing attacks and defenses. We fill this gap by presenting a large-scale measurement of different membership inference attacks and defenses. We systematize membership inference through the study of nine attacks and six defenses and measure the performance of different attacks and defenses in the holistic evaluation. We then quantify the impact of the threat model on the results of these attacks. We find that some assumptions of the threat model, such as same-architecture and same-distribution between shadow and target models, are unnecessary. We are also the first to execute attacks on the real-world data collected from the Internet, instead of laboratory datasets. We further investigate what determines the performance of membership inference attacks and reveal that the commonly believed overfitting level is not sufficient for the success of the attacks. Instead, the Jensen-Shannon distance of entropy/cross-entropy between member and non-member samples correlates with attack performance much better. This gives us a new way to accurately predict membership inference risks without running the attack. Finally, we find that data augmentation degrades the performance of existing attacks to a larger extent, and we propose an adaptive attack using augmentation to train shadow and attack models that improve attack performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题