延迟通过模仿学习

论文标题

延迟通过模仿学习

Delayed Reinforcement Learning by Imitation

论文作者

Liotet, Pierre, Maran, Davide, Bisi, Lorenzo, Restelli, Marcello

论文摘要

当代理商的观察结果或互动被延迟时，经典的增强学习工具通常会失败。在本文中，我们为这个问题提出了一个简单而有效的解决方案。我们假设，在未估计的环境中，有效的政策是已知或可以轻松学习的，但是该任务可能会遭受实践延迟，因此我们希望考虑到它们。我们提出了一种新颖的算法，使用数据集聚合（DIDA）延迟模仿，该算法以模仿学习方法为基础，以学习如何在未延误的演示中在延迟的环境中行动。我们提供了将指导DIDA实践设计的方法的理论分析。这些结果也对延迟的加强学习文献的普遍感兴趣，通过在平稳性条件下提供延迟和未延迟任务之间的性能的界限。我们从经验上表明，DIDA在各种任务上具有出色的样本效率，包括机器人运动，经典控制和交易，具有出色的样本效率。

When the agent's observations or interactions are delayed, classic reinforcement learning tools usually fail. In this paper, we propose a simple yet new and efficient solution to this problem. We assume that, in the undelayed environment, an efficient policy is known or can be easily learned, but the task may suffer from delays in practice and we thus want to take them into account. We present a novel algorithm, Delayed Imitation with Dataset Aggregation (DIDA), which builds upon imitation learning methods to learn how to act in a delayed environment from undelayed demonstrations. We provide a theoretical analysis of the approach that will guide the practical design of DIDA. These results are also of general interest in the delayed reinforcement learning literature by providing bounds on the performance between delayed and undelayed tasks, under smoothness conditions. We show empirically that DIDA obtains high performances with a remarkable sample efficiency on a variety of tasks, including robotic locomotion, classic control, and trading.

下载PDF全文

下载文献需遵守相关版权规定

论文标题