Trojanzoo：对神经后门的统一，整体和实践评估

论文标题

Trojanzoo：对神经后门的统一，整体和实践评估

TrojanZoo: Towards Unified, Holistic, and Practical Evaluation of Neural Backdoors

论文作者

Pang, Ren, Zhang, Zheng, Gao, Xiangshan, Xi, Zhaohan, Ji, Shouling, Cheng, Peng, Luo, Xiapu, Wang, Ting

论文摘要

神经后门是对深度学习系统安全性的主要威胁。这项密集的研究产生了大量的后门攻击/防守，导致持续的军备竞赛。但是，由于缺乏评估基准，许多关键问题的探索仍然不足：（i）不同攻击/防御的优势和局限性是什么？（ii）操作它们的最佳实践是什么？（iii）如何进一步改善现有的攻击/防御措施？为了弥合这一差距，我们设计并实施了Trojanzoo，这是第一个以统一，整体和实用的方式评估神经后门攻击/防御措施的开源平台。到目前为止，它专注于计算机视觉域，已结合了8种代表性攻击，14个最先进的防御能力，6个攻击性能指标，10个防御效用指标，以及丰富的工具，用于对攻击防御互动进行深入分析。利用Trojanzoo，我们对现有的攻击/防御措施进行了系统的研究，揭示了其复杂的设计范围：在多个Desiderata（例如，有效性，回避性和攻击的可转移性）中，两者都表现出复杂的权衡。我们进一步探讨了改善现有的攻击/防御措施，从而导致许多有趣的发现：（i）一个像素触发器通常足够；（ii）从头开始培训通常优于驱动良性模型来制作特洛伊木马模型；（iii）优化触发器和特洛伊木马模型共同提高了攻击效率和逃避性；（iv）通常可以通过自适应攻击来逃避个人防御；（v）利用模型可解释性可显着提高防御鲁棒性。我们设想Trojanzoo将成为促进对神经后门的未来研究的宝贵平台。

Neural backdoors represent one primary threat to the security of deep learning systems. The intensive research has produced a plethora of backdoor attacks/defenses, resulting in a constant arms race. However, due to the lack of evaluation benchmarks, many critical questions remain under-explored: (i) what are the strengths and limitations of different attacks/defenses? (ii) what are the best practices to operate them? and (iii) how can the existing attacks/defenses be further improved? To bridge this gap, we design and implement TROJANZOO, the first open-source platform for evaluating neural backdoor attacks/defenses in a unified, holistic, and practical manner. Thus far, focusing on the computer vision domain, it has incorporated 8 representative attacks, 14 state-of-the-art defenses, 6 attack performance metrics, 10 defense utility metrics, as well as rich tools for in-depth analysis of the attack-defense interactions. Leveraging TROJANZOO, we conduct a systematic study on the existing attacks/defenses, unveiling their complex design spectrum: both manifest intricate trade-offs among multiple desiderata (e.g., the effectiveness, evasiveness, and transferability of attacks). We further explore improving the existing attacks/defenses, leading to a number of interesting findings: (i) one-pixel triggers often suffice; (ii) training from scratch often outperforms perturbing benign models to craft trojan models; (iii) optimizing triggers and trojan models jointly greatly improves both attack effectiveness and evasiveness; (iv) individual defenses can often be evaded by adaptive attacks; and (v) exploiting model interpretability significantly improves defense robustness. We envision that TROJANZOO will serve as a valuable platform to facilitate future research on neural backdoors.

下载PDF全文

下载文献需遵守相关版权规定

论文标题