增强学习的生成对抗探索

论文标题

增强学习的生成对抗探索

Generative Adversarial Exploration for Reinforcement Learning

论文作者

Hong, Weijun, Zhu, Menghui, Liu, Minghuan, Zhang, Weinan, Zhou, Ming, Yu, Yong, Sun, Peng

论文摘要

探索对于训练最佳加强学习（RL）政策至关重要，在这种政策中，关键是歧视国家访问是否是新颖的。以前的大多数工作都着重于设计启发式规则或距离指标，以检查状态是否是新颖的，而无需考虑可以学习的歧视过程。在本文中，我们提出了一种称为“生成对抗探索”（GAEX）的新颖方法，以鼓励在RL中引入来自生成对抗网络的固有奖励输出，其中发电机提供了伪造的状态样本，以帮助歧视者识别那些频繁访问的状态较差的国家。因此，鼓励代理人拜访那些歧视者不太有信心审判的国家。 GAEX易于实施和高训练效率。在我们的实验中，我们将GAEX应用于DQN，而DQN-GAEX算法在挑战性的勘探问题上取得了令人信服的表现，包括Game Venture，Montezuma的Revenge和Super Mario Bros，而无需进一步挑战学习算法，而没有进一步进行微调。据我们所知，这是在RL勘探问题中使用GAN的第一项工作。

Exploration is crucial for training the optimal reinforcement learning (RL) policy, where the key is to discriminate whether a state visiting is novel. Most previous work focuses on designing heuristic rules or distance metrics to check whether a state is novel without considering such a discrimination process that can be learned. In this paper, we propose a novel method called generative adversarial exploration (GAEX) to encourage exploration in RL via introducing an intrinsic reward output from a generative adversarial network, where the generator provides fake samples of states that help discriminator identify those less frequently visited states. Thus the agent is encouraged to visit those states which the discriminator is less confident to judge as visited. GAEX is easy to implement and of high training efficiency. In our experiments, we apply GAEX into DQN and the DQN-GAEX algorithm achieves convincing performance on challenging exploration problems, including the game Venture, Montezuma's Revenge and Super Mario Bros, without further fine-tuning on complicate learning algorithms. To our knowledge, this is the first work to employ GAN in RL exploration problems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题