论文标题
窃取深厚的加强学习模型以获取乐趣和利润
Stealing Deep Reinforcement Learning Models for Fun and Profit
论文作者
论文摘要
本文提出了针对深钢筋学习(DRL)的第一个模型提取攻击,该攻击使外部对手能够仅从与环境的相互作用中恢复黑盒DRL模型。针对监督深度学习模型的模型提取攻击已得到广泛研究。但是,由于DRL模型的高复杂性,随机性和有限的可观察信息,这些技术不能应用于增强学习方案。我们提出了一种克服上述挑战的新方法。我们方法的关键见解是,DRL模型提取的过程等同于模仿学习,这是一种学习顺序决策策略的良好解决方案。基于此观察,我们的方法首先建立了一个分类器,以揭示目标黑盒DRL模型的培训算法家族仅基于其预测的动作,然后利用最先进的模仿学习技术来从已确定的算法家庭中复制该模型。实验结果表明,我们的方法可以有效地以高保真度和准确性恢复DRL模型。我们还证明了两种用例,以表明我们的模型提取攻击可以(1)显着提高对抗性攻击的成功率,并且(2)即使受到DNN水印的保护,也可以窃取DRL模型。这些对DRL应用的知识产权和隐私保护构成了严重威胁。
This paper presents the first model extraction attack against Deep Reinforcement Learning (DRL), which enables an external adversary to precisely recover a black-box DRL model only from its interaction with the environment. Model extraction attacks against supervised Deep Learning models have been widely studied. However, those techniques cannot be applied to the reinforcement learning scenario due to DRL models' high complexity, stochasticity and limited observable information. We propose a novel methodology to overcome the above challenges. The key insight of our approach is that the process of DRL model extraction is equivalent to imitation learning, a well-established solution to learn sequential decision-making policies. Based on this observation, our methodology first builds a classifier to reveal the training algorithm family of the targeted black-box DRL model only based on its predicted actions, and then leverages state-of-the-art imitation learning techniques to replicate the model from the identified algorithm family. Experimental results indicate that our methodology can effectively recover the DRL models with high fidelity and accuracy. We also demonstrate two use cases to show that our model extraction attack can (1) significantly improve the success rate of adversarial attacks, and (2) steal DRL models stealthily even they are protected by DNN watermarks. These pose a severe threat to the intellectual property and privacy protection of DRL applications.