虚幻攻击：信息理论可检测性在对抗攻击中很重要

论文标题

虚幻攻击：信息理论可检测性在对抗攻击中很重要

Illusory Attacks: Information-Theoretic Detectability Matters in Adversarial Attacks

论文作者

Franzmeyer, Tim, McAleer, Stephen, Henriques, João F., Foerster, Jakob N., Torr, Philip H. S., Bibi, Adel, de Witt, Christian Schroeder

论文摘要

部署在现实世界中的自主剂必须与对感官输入的对抗性攻击保持强大的态度。强大的代理政策需要预期最强烈的攻击。我们证明，现有的观察空间攻击对强化学习剂的攻击具有共同的弱点：虽然有效，但他们缺乏信息理论可检测性约束，使它们可以使用自动化手段或人类检查。对手可能触发安全性升级，因此可检测性是不可能的。我们引入了ε-透苏，这是对顺序决策者的对抗性攻击的一种新型形式，既有效又是由ε结合的统计检测性。我们提出了一种新颖的双重攀登算法，以学习端到端的这种攻击。与现有的攻击相比，我们从经验上发现，使用自动化方法检测到ε-弹性很难检测到，并且对人类参与者进行的一项小型研究（IRB批准后，参考R84123/RE001）表明，它们同样难以检测到人类。我们的发现表明需要更好的异常检测器以及有效的硬件和系统级防御。可以在https://tinyurl.com/illusory-attacks上找到该项目网站。

Autonomous agents deployed in the real world need to be robust against adversarial attacks on sensory inputs. Robustifying agent policies requires anticipating the strongest attacks possible. We demonstrate that existing observation-space attacks on reinforcement learning agents have a common weakness: while effective, their lack of information-theoretic detectability constraints makes them detectable using automated means or human inspection. Detectability is undesirable to adversaries as it may trigger security escalations. We introduce ε-illusory, a novel form of adversarial attack on sequential decision-makers that is both effective and of ε-bounded statistical detectability. We propose a novel dual ascent algorithm to learn such attacks end-to-end. Compared to existing attacks, we empirically find ε-illusory to be significantly harder to detect with automated methods, and a small study with human participants (IRB approval under reference R84123/RE001) suggests they are similarly harder to detect for humans. Our findings suggest the need for better anomaly detectors, as well as effective hardware- and system-level defenses. The project website can be found at https://tinyurl.com/illusory-attacks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题