鹦鹉：用于加强学习的数据驱动行为先验

论文标题

鹦鹉：用于加强学习的数据驱动行为先验

Parrot: Data-Driven Behavioral Priors for Reinforcement Learning

论文作者

Singh, Avi, Liu, Huihan, Zhou, Gaoyue, Yu, Albert, Rhinehart, Nicholas, Levine, Sergey

论文摘要

强化学习为灵活的决策和控制提供了一个一般框架，但需要为代理商需要学习的每个新任务收集大量数据。在其他机器学习领域（例如自然语言处理或计算机视觉）中，在大型的，先前收集的数据集中进行预训练以进行新任务的启动，以便在学习新任务时减少数据需求的强大范式。在本文中，我们提出以下问题：如何为RL代理启用类似有用的预培训？我们提出了一种用于训练前培训的方法，该方法可以捕获从一系列先前看到的任务的成功试验中观察到的复杂输入输出关系，并且我们展示了如何将此所学的先验用于快速学习新任务而不会阻碍RL代理人尝试尝试新型行为的能力。我们证明了我们的方法在挑战涉及图像观察和稀疏奖励功能的机器人操作域中的有效性，在此，我们的方法优于先前的作品。

Reinforcement learning provides a general framework for flexible decision making and control, but requires extensive data collection for each new task that an agent needs to learn. In other machine learning fields, such as natural language processing or computer vision, pre-training on large, previously collected datasets to bootstrap learning for new tasks has emerged as a powerful paradigm to reduce data requirements when learning a new task. In this paper, we ask the following question: how can we enable similarly useful pre-training for RL agents? We propose a method for pre-training behavioral priors that can capture complex input-output relationships observed in successful trials from a wide range of previously seen tasks, and we show how this learned prior can be used for rapidly learning new tasks without impeding the RL agent's ability to try out novel behaviors. We demonstrate the effectiveness of our approach in challenging robotic manipulation domains involving image observations and sparse reward functions, where our method outperforms prior works by a substantial margin.

下载PDF全文

下载文献需遵守相关版权规定

论文标题