行动指导：获得实时策略游戏的最佳稀疏奖励和形状的奖励

论文标题

行动指导：获得实时策略游戏的最佳稀疏奖励和形状的奖励

Action Guidance: Getting the Best of Sparse Rewards and Shaped Rewards for Real-time Strategy Games

论文作者

Huang, Shengyi, Ontañón, Santiago

论文摘要

在具有稀疏奖励的游戏中使用强化学习的培训代理是一个具有挑战性的问题，因为需要大量探索才能获得第一个奖励。为了解决这个问题，一种常见的方法是使用奖励成型来帮助探索。但是，奖励成型的重要缺点是，代理有时会学会优化形状奖励，而不是真正的目标。在本文中，我们提出了一种新颖的技术，我们将其称为“行动指南”，该技术成功地训练代理商最终优化具有稀疏奖励的游戏中的真正目标，同时保持奖励成型带来的大多数样本效率。我们在简化的实时策略（RTS）游戏模拟器中评估我们的方法，称为$μ$ rts。

Training agents using Reinforcement Learning in games with sparse rewards is a challenging problem, since large amounts of exploration are required to retrieve even the first reward. To tackle this problem, a common approach is to use reward shaping to help exploration. However, an important drawback of reward shaping is that agents sometimes learn to optimize the shaped reward instead of the true objective. In this paper, we present a novel technique that we call action guidance that successfully trains agents to eventually optimize the true objective in games with sparse rewards while maintaining most of the sample efficiency that comes with reward shaping. We evaluate our approach in a simplified real-time strategy (RTS) game simulator called $μ$RTS.

下载PDF全文

下载文献需遵守相关版权规定

论文标题