论文标题
Trojanzoo:对神经后门的统一,整体和实践评估
TrojanZoo: Towards Unified, Holistic, and Practical Evaluation of Neural Backdoors
论文作者
论文摘要
神经后门是对深度学习系统安全性的主要威胁。这项密集的研究产生了大量的后门攻击/防守,导致持续的军备竞赛。但是,由于缺乏评估基准,许多关键问题的探索仍然不足:(i)不同攻击/防御的优势和局限性是什么? (ii)操作它们的最佳实践是什么? (iii)如何进一步改善现有的攻击/防御措施? 为了弥合这一差距,我们设计并实施了Trojanzoo,这是第一个以统一,整体和实用的方式评估神经后门攻击/防御措施的开源平台。到目前为止,它专注于计算机视觉域,已结合了8种代表性攻击,14个最先进的防御能力,6个攻击性能指标,10个防御效用指标,以及丰富的工具,用于对攻击防御互动进行深入分析。利用Trojanzoo,我们对现有的攻击/防御措施进行了系统的研究,揭示了其复杂的设计范围:在多个Desiderata(例如,有效性,回避性和攻击的可转移性)中,两者都表现出复杂的权衡。我们进一步探讨了改善现有的攻击/防御措施,从而导致许多有趣的发现:(i)一个像素触发器通常足够; (ii)从头开始培训通常优于驱动良性模型来制作特洛伊木马模型; (iii)优化触发器和特洛伊木马模型共同提高了攻击效率和逃避性; (iv)通常可以通过自适应攻击来逃避个人防御; (v)利用模型可解释性可显着提高防御鲁棒性。我们设想Trojanzoo将成为促进对神经后门的未来研究的宝贵平台。
Neural backdoors represent one primary threat to the security of deep learning systems. The intensive research has produced a plethora of backdoor attacks/defenses, resulting in a constant arms race. However, due to the lack of evaluation benchmarks, many critical questions remain under-explored: (i) what are the strengths and limitations of different attacks/defenses? (ii) what are the best practices to operate them? and (iii) how can the existing attacks/defenses be further improved? To bridge this gap, we design and implement TROJANZOO, the first open-source platform for evaluating neural backdoor attacks/defenses in a unified, holistic, and practical manner. Thus far, focusing on the computer vision domain, it has incorporated 8 representative attacks, 14 state-of-the-art defenses, 6 attack performance metrics, 10 defense utility metrics, as well as rich tools for in-depth analysis of the attack-defense interactions. Leveraging TROJANZOO, we conduct a systematic study on the existing attacks/defenses, unveiling their complex design spectrum: both manifest intricate trade-offs among multiple desiderata (e.g., the effectiveness, evasiveness, and transferability of attacks). We further explore improving the existing attacks/defenses, leading to a number of interesting findings: (i) one-pixel triggers often suffice; (ii) training from scratch often outperforms perturbing benign models to craft trojan models; (iii) optimizing triggers and trojan models jointly greatly improves both attack effectiveness and evasiveness; (iv) individual defenses can often be evaded by adaptive attacks; and (v) exploiting model interpretability significantly improves defense robustness. We envision that TROJANZOO will serve as a valuable platform to facilitate future research on neural backdoors.