对抗性扰动dood deepfake探测器

论文标题

对抗性扰动dood deepfake探测器

Adversarial Perturbations Fool Deepfake Detectors

论文作者

Gandhi, Apurva, Jain, Shomik

论文摘要

这项工作使用对抗性扰动来增强深击图像和愚蠢的共同摄影检测器。我们使用快速梯度标志方法以及BlackBox和WhiteBox设置中的Carlini和Wagner L2 Norm Attack创建了对抗扰动。在不受干扰的深击中，检测器的准确性超过95％，但在扰动的深击中的准确性不到27％。我们还探索了深泡检测器的两个改进：（i）Lipschitz正则化，以及（ii）Deep Image Prior（DIP）。 Lipschitz正则化限制了检测器相对于输入的梯度，以增加对输入扰动的鲁棒性。 DIP防御以无监督的方式使用生成卷积神经网络消除了扰动。正则化改善了平均检测到扰动的深击，包括黑盒案例的准确性提高10％。 DIP防御在欺骗原始检测器的扰动深击中达到了95％的精度，而在100个图像子样本中，在其他情况下保持98％的精度。

This work uses adversarial perturbations to enhance deepfake images and fool common deepfake detectors. We created adversarial perturbations using the Fast Gradient Sign Method and the Carlini and Wagner L2 norm attack in both blackbox and whitebox settings. Detectors achieved over 95% accuracy on unperturbed deepfakes, but less than 27% accuracy on perturbed deepfakes. We also explore two improvements to deepfake detectors: (i) Lipschitz regularization, and (ii) Deep Image Prior (DIP). Lipschitz regularization constrains the gradient of the detector with respect to the input in order to increase robustness to input perturbations. The DIP defense removes perturbations using generative convolutional neural networks in an unsupervised manner. Regularization improved the detection of perturbed deepfakes on average, including a 10% accuracy boost in the blackbox case. The DIP defense achieved 95% accuracy on perturbed deepfakes that fooled the original detector, while retaining 98% accuracy in other cases on a 100 image subsample.

下载PDF全文

下载文献需遵守相关版权规定

论文标题