论文标题
固有认证的鲁棒性与数据中毒攻击
Intrinsic Certified Robustness of Bagging against Data Poisoning Attacks
论文作者
论文摘要
在\ emph {数据中毒攻击}中,攻击者修改,删除和/或插入一些培训示例以破坏学习的机器学习模型。 \ emph {bootstrap汇总(装袋)}是一种众所周知的合奏学习方法,它使用基本学习算法在训练数据集的随机子样本上训练多个基本模型,并使用多数投票来预测测试示例的标签。我们证明了对数据中毒攻击的固有认证的鲁棒性。具体而言,我们表明,使用任意基础学习算法的装袋可证明可以预测当修改,删除和/或插入的培训示例的数量被阈值界定时,可以预测一个测试示例的同一标签。此外,我们表明,如果没有对基础学习算法进行假设,则我们的派生阈值很紧。我们评估了MNIST和CIFAR10的方法。例如,我们的方法在任意修改,删除和/或插入100个培训示例时,获得了MNIST的认证准确性$ 91.1 \%$。代码可在:\ url {https://github.com/jjy1994/baggingcertifydatapoison}中获得。
In a \emph{data poisoning attack}, an attacker modifies, deletes, and/or inserts some training examples to corrupt the learnt machine learning model. \emph{Bootstrap Aggregating (bagging)} is a well-known ensemble learning method, which trains multiple base models on random subsamples of a training dataset using a base learning algorithm and uses majority vote to predict labels of testing examples. We prove the intrinsic certified robustness of bagging against data poisoning attacks. Specifically, we show that bagging with an arbitrary base learning algorithm provably predicts the same label for a testing example when the number of modified, deleted, and/or inserted training examples is bounded by a threshold. Moreover, we show that our derived threshold is tight if no assumptions on the base learning algorithm are made. We evaluate our method on MNIST and CIFAR10. For instance, our method achieves a certified accuracy of $91.1\%$ on MNIST when arbitrarily modifying, deleting, and/or inserting 100 training examples. Code is available at: \url{https://github.com/jjy1994/BaggingCertifyDataPoisoning}.