关于不断发展的模型中的后门的永久性

论文标题

关于不断发展的模型中的后门的永久性

On the Permanence of Backdoors in Evolving Models

论文作者

Li, Huiying, Bhagoji, Arjun Nitin, Chen, Yuxin, Zheng, Haitao, Zhao, Ben Y.

论文摘要

现有关于对深神经网络（DNN）（例如后门）的训练时间攻击的研究，很大程度上假设模型是一旦训练的静态，并且被训练为模型的隐藏后门仍然无限期地活跃。实际上，模型很少是静态的，但可以不断发展以解决基础数据中的分布漂移。本文探讨了时变模型中后门攻击的行为，其模型权重通过微调不断更新以适应数据漂移。我们的理论分析表明，对新鲜数据进行微调逐渐“删除”了注射后门的方式，我们的经验研究说明了在各种培训和攻击环境下，时变模型“忘记”后门的速度。我们还表明，使用智能学习率的新型微调策略可以大大加速后门遗忘。最后，我们讨论了针对特定针对时变模型的新的后门防御的需求。

Existing research on training-time attacks for deep neural networks (DNNs), such as backdoors, largely assume that models are static once trained, and hidden backdoors trained into models remain active indefinitely. In practice, models are rarely static but evolve continuously to address distribution drifts in the underlying data. This paper explores the behavior of backdoor attacks in time-varying models, whose model weights are continually updated via fine-tuning to adapt to data drifts. Our theoretical analysis shows how fine-tuning with fresh data progressively "erases" the injected backdoors, and our empirical study illustrates how quickly a time-varying model "forgets" backdoors under a variety of training and attack settings. We also show that novel fine-tuning strategies using smart learning rates can significantly accelerate backdoor forgetting. Finally, we discuss the need for new backdoor defenses that target time-varying models specifically.

下载PDF全文

下载文献需遵守相关版权规定

论文标题