会员推断攻击和分类模型中的防御

论文标题

会员推断攻击和分类模型中的防御

Membership Inference Attacks and Defenses in Classification Models

论文作者

Li, Jiacheng, Li, Ninghui, Ribeiro, Bruno

论文摘要

我们研究了对分类器的成员推理（MI）攻击，攻击者的目标是确定是否使用数据实例来培训分类器。通过系统的现有MI攻击和对其进行广泛的实验评估的分类，我们发现模型的MI攻击脆弱性与概括差距紧密相关 - 训练准确性和测试准确性之间的差异。然后，我们提出了针对MI攻击的防御，旨在通过故意降低训练准确性来缩小差距。更具体地说，训练过程试图通过新的{\ em集正规器}匹配训练和验证精度，并使用训练和验证集的SoftMax输出经验分布之间的最大平均差异。我们的实验结果表明，将这种方法与另一种简单的防御（混合训练）相结合可显着改善针对MI攻击的最新防御，对测试准确性的影响最小。

We study the membership inference (MI) attack against classifiers, where the attacker's goal is to determine whether a data instance was used for training the classifier. Through systematic cataloging of existing MI attacks and extensive experimental evaluations of them, we find that a model's vulnerability to MI attacks is tightly related to the generalization gap -- the difference between training accuracy and test accuracy. We then propose a defense against MI attacks that aims to close the gap by intentionally reduces the training accuracy. More specifically, the training process attempts to match the training and validation accuracies, by means of a new {\em set regularizer} using the Maximum Mean Discrepancy between the softmax output empirical distributions of the training and validation sets. Our experimental results show that combining this approach with another simple defense (mix-up training) significantly improves state-of-the-art defense against MI attacks, with minimal impact on testing accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题