论文标题
通过输入损失景观分析和正则化理解和打击强大的过度拟合
Understanding and Combating Robust Overfitting via Input Loss Landscape Analysis and Regularization
论文作者
论文摘要
对抗训练被广泛用于改善深神经网络对对抗性攻击的鲁棒性。但是,对抗性训练容易过度拟合,原因远非明确。这项工作阐明了通过分析损失景观W.R.T.的过度拟合机制。输入。我们发现,标准训练特别是清洁损失的最小化,可以通过损失梯度的正则化来减轻强大的过度拟合结果。此外,我们发现在对抗训练期间,强大的过度适应越来越明显,因为由于损失景观曲率的增加,对抗训练的梯度正则化效应变得较弱。为了改善鲁棒的概括,我们提出了一个新的正规器,以通过对对抗方向的加权逻辑变化进行惩罚,以平滑损失格局。与类似的先前方法相比,我们的方法显着减轻了强大的过度拟合和达到最高的鲁棒性和效率。代码可从https://github.com/treelli/combating-ro-advlc获得。
Adversarial training is widely used to improve the robustness of deep neural networks to adversarial attack. However, adversarial training is prone to overfitting, and the cause is far from clear. This work sheds light on the mechanisms underlying overfitting through analyzing the loss landscape w.r.t. the input. We find that robust overfitting results from standard training, specifically the minimization of the clean loss, and can be mitigated by regularization of the loss gradients. Moreover, we find that robust overfitting turns severer during adversarial training partially because the gradient regularization effect of adversarial training becomes weaker due to the increase in the loss landscapes curvature. To improve robust generalization, we propose a new regularizer to smooth the loss landscape by penalizing the weighted logits variation along the adversarial direction. Our method significantly mitigates robust overfitting and achieves the highest robustness and efficiency compared to similar previous methods. Code is available at https://github.com/TreeLLi/Combating-RO-AdvLC.