论文标题
基于查询的针对性空间对抗性政策对深度强化学习代理
Query-based Targeted Action-Space Adversarial Policies on Deep Reinforcement Learning Agents
论文作者
论文摘要
计算资源的进步导致网络物理系统(CPS)的复杂性日益增加。随着CPS的复杂性的发展,重点已从传统的控制方法转变为基于强化学习的方法(DRL),以控制这些系统。这是由于很难获得传统控制的复杂CP的准确模型。但是,要安全地在生产中部署DRL,必须从各个角度检查基于DRL的控制器(策略)的弱点(策略)。在这项工作中,我们研究了动作空间域中的针对性攻击,这在CPS文献中通常被称为驱动攻击,这使控制器的输出呈现。我们表明,基于查询的黑框攻击模型可以将相对于对抗性目标产生最佳扰动,以作为另一个强化学习问题。因此,可以使用常规DRL方法对这种对抗性政策进行培训。实验结果表明,仅观察名义政策的产出的对抗性政策比观察名义政策的输入和输出的对抗性政策产生更强的攻击。进一步的分析表明,其输出经常处于动作空间边界的名义策略自然更适合对抗性策略。最后,我们建议将对抗性训练与转移学习一起使用,以将强大的行为引起名义政策,从而将成功的目标攻击率降低了50%。
Advances in computing resources have resulted in the increasing complexity of cyber-physical systems (CPS). As the complexity of CPS evolved, the focus has shifted from traditional control methods to deep reinforcement learning-based (DRL) methods for control of these systems. This is due to the difficulty of obtaining accurate models of complex CPS for traditional control. However, to securely deploy DRL in production, it is essential to examine the weaknesses of DRL-based controllers (policies) towards malicious attacks from all angles. In this work, we investigate targeted attacks in the action-space domain, also commonly known as actuation attacks in CPS literature, which perturbs the outputs of a controller. We show that a query-based black-box attack model that generates optimal perturbations with respect to an adversarial goal can be formulated as another reinforcement learning problem. Thus, such an adversarial policy can be trained using conventional DRL methods. Experimental results showed that adversarial policies that only observe the nominal policy's output generate stronger attacks than adversarial policies that observe the nominal policy's input and output. Further analysis reveals that nominal policies whose outputs are frequently at the boundaries of the action space are naturally more robust towards adversarial policies. Lastly, we propose the use of adversarial training with transfer learning to induce robust behaviors into the nominal policy, which decreases the rate of successful targeted attacks by 50%.