如何进行后门扩散模型？

论文标题

如何进行后门扩散模型？

How to Backdoor Diffusion Models?

论文作者

Chou, Sheng-Yen, Chen, Pin-Yu, Ho, Tsung-Yi

论文摘要

扩散模型是最先进的深度学习授权生成模型，这些模型是根据学习前进和通过渐进式噪声和转化过程反向扩散过程的原理进行训练的。为了更好地了解局限性和潜在风险，本文介绍了关于扩散模型针对后门攻击的鲁棒性的首次研究。具体而言，我们提出了BadDiffusion，这是一个新型的攻击框架，该攻击框架在用于后门植入模型训练期间妥协了扩散过程。在推理阶段，后式扩散模型的行为将就像未击沉的发电机一样用于常规数据输入，同时在收到植入的触发信号时错误地生成了不良演员设计的一些针对性结果。对于有问题的模型构建的下游任务和应用程序，这种关键风险可能令人恐惧。我们在各种后门攻击设置上进行的广泛实验表明，不良扩散会始终导致具有高实用性和目标特异性的扩散模型。更糟糕的是，通过简单地将干净的预训练扩散模型列入植入后门，可以使坏扩散变得具有成本效益。我们还探讨了一些可能降低风险的可能对策。我们的结果呼吁人们注意潜在的风险，并可能滥用扩散模型。我们的代码可在https://github.com/ibm/baddiffusion上找到。

Diffusion models are state-of-the-art deep learning empowered generative models that are trained based on the principle of learning forward and reverse diffusion processes via progressive noise-addition and denoising. To gain a better understanding of the limitations and potential risks, this paper presents the first study on the robustness of diffusion models against backdoor attacks. Specifically, we propose BadDiffusion, a novel attack framework that engineers compromised diffusion processes during model training for backdoor implantation. At the inference stage, the backdoored diffusion model will behave just like an untampered generator for regular data inputs, while falsely generating some targeted outcome designed by the bad actor upon receiving the implanted trigger signal. Such a critical risk can be dreadful for downstream tasks and applications built upon the problematic model. Our extensive experiments on various backdoor attack settings show that BadDiffusion can consistently lead to compromised diffusion models with high utility and target specificity. Even worse, BadDiffusion can be made cost-effective by simply finetuning a clean pre-trained diffusion model to implant backdoors. We also explore some possible countermeasures for risk mitigation. Our results call attention to potential risks and possible misuse of diffusion models. Our code is available on https://github.com/IBM/BadDiffusion.

下载PDF全文

下载文献需遵守相关版权规定

论文标题