论文标题
散发出色的弹性更高:有效的中毒攻击和线性回归模型的防御
With Great Dispersion Comes Greater Resilience: Efficient Poisoning Attacks and Defenses for Linear Regression Models
论文作者
论文摘要
随着机器学习管道中的第三方的兴起,“机器学习作为服务”(MLAA)或在线学习中的外部数据贡献者或现有模型的重新培训中的服务提供商的需求已成为越来越重要的主题。安全界表明,如果没有数据的透明度和由此产生的模型,就会存在许多潜在的安全风险,并不断发现新的风险。 在本文中,我们专注于这些安全风险之一 - 中毒攻击。具体来说,我们分析攻击者如何通过毒害培训数据集来干扰回归学习的结果。为此,我们分析并开发了一种新的中毒攻击算法。与先前的中毒攻击算法相比,我们称为NOPT的攻击可以通过中毒数据点相同的数据产生更大的错误。此外,我们还显着改善了Jagielsk等人提出的最新防御算法,称为Trim。 (IEEE S&P 2018),将清洁数据点的概率估计概念纳入算法中。我们的新防御算法称为PRODA,证明了通过优化合奏模型来减少中毒数据集引起的错误的有效性。我们强调,尚未估计装饰的时间复杂性。但是,我们从他们的工作中得出,在最坏的情况下,Trim可以在超过PODA的对数时间中达到指数时间的复杂性。我们提出的攻击和国防算法的性能在四个现实世界中的住房价格,贷款,医疗保健和自行车共享服务方面进行了广泛的评估。我们希望我们的工作能够激发未来的研究,以开发更强大的学习算法对中毒攻击的影响。
With the rise of third parties in the machine learning pipeline, the service provider in "Machine Learning as a Service" (MLaaS), or external data contributors in online learning, or the retraining of existing models, the need to ensure the security of the resulting machine learning models has become an increasingly important topic. The security community has demonstrated that without transparency of the data and the resulting model, there exist many potential security risks, with new risks constantly being discovered. In this paper, we focus on one of these security risks -- poisoning attacks. Specifically, we analyze how attackers may interfere with the results of regression learning by poisoning the training datasets. To this end, we analyze and develop a new poisoning attack algorithm. Our attack, termed Nopt, in contrast with previous poisoning attack algorithms, can produce larger errors with the same proportion of poisoning data-points. Furthermore, we also significantly improve the state-of-the-art defense algorithm, termed TRIM, proposed by Jagielsk et al. (IEEE S&P 2018), by incorporating the concept of probability estimation of clean data-points into the algorithm. Our new defense algorithm, termed Proda, demonstrates an increased effectiveness in reducing errors arising from the poisoning dataset through optimizing ensemble models. We highlight that the time complexity of TRIM had not been estimated; however, we deduce from their work that TRIM can take exponential time complexity in the worst-case scenario, in excess of Proda's logarithmic time. The performance of both our proposed attack and defense algorithms is extensively evaluated on four real-world datasets of housing prices, loans, health care, and bike sharing services. We hope that our work will inspire future research to develop more robust learning algorithms immune to poisoning attacks.