通过技能空间计划的无重置终身学习

论文标题

通过技能空间计划的无重置终身学习

Reset-Free Lifelong Learning with Skill-Space Planning

论文作者

Lu, Kevin, Grover, Aditya, Abbeel, Pieter, Mordatch, Igor

论文摘要

终生增强学习（RL）的目的是优化可以在不断变化的环境中持续适应和相互作用的代理。但是，当环境是非平稳的，相互作用是非剧本的，当前的RL方法急剧失败。我们提出了终身技能计划（LISP），这是一种基于在高阶技能的抽象空间中的计划，用于非剧烈终身RL的算法框架。我们使用固有的奖励以无监督的方式学习技能，并使用学习的动力学模型对学到的技能进行计划。此外，我们的框架甚至可以从离线数据中发现技能发现，从而减少了对现实世界中过度交互的需求。我们从经验上证明，LISP成功地实现了长途计划，并学习了可以避免灾难性失败的代理，即使在挑战非稳态和非情节性环境中，源自Gridworld和Mujoco基准。

The objective of lifelong reinforcement learning (RL) is to optimize agents which can continuously adapt and interact in changing environments. However, current RL approaches fail drastically when environments are non-stationary and interactions are non-episodic. We propose Lifelong Skill Planning (LiSP), an algorithmic framework for non-episodic lifelong RL based on planning in an abstract space of higher-order skills. We learn the skills in an unsupervised manner using intrinsic rewards and plan over the learned skills using a learned dynamics model. Moreover, our framework permits skill discovery even from offline data, thereby reducing the need for excessive real-world interactions. We demonstrate empirically that LiSP successfully enables long-horizon planning and learns agents that can avoid catastrophic failures even in challenging non-stationary and non-episodic environments derived from gridworld and MuJoCo benchmarks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题