论文标题

通过技能空间计划的无重置终身学习

Reset-Free Lifelong Learning with Skill-Space Planning

论文作者

Lu, Kevin, Grover, Aditya, Abbeel, Pieter, Mordatch, Igor

论文摘要

终生增强学习(RL)的目的是优化可以在不断变化的环境中持续适应和相互作用的代理。但是,当环境是非平稳的,相互作用是非剧本的,当前的RL方法急剧失败。我们提出了终身技能计划(LISP),这是一种基于在高阶技能的抽象空间中的计划,用于非剧烈终身RL的算法框架。我们使用固有的奖励以无监督的方式学习技能,并使用学习的动力学模型对学到的技能进行计划。此外,我们的框架甚至可以从离线数据中发现技能发现,从而减少了对现实世界中过度交互的需求。我们从经验上证明,LISP成功地实现了长途计划,并学习了可以避免灾难性失败的代理,即使在挑战非稳态和非情节性环境中,源自Gridworld和Mujoco基准。

The objective of lifelong reinforcement learning (RL) is to optimize agents which can continuously adapt and interact in changing environments. However, current RL approaches fail drastically when environments are non-stationary and interactions are non-episodic. We propose Lifelong Skill Planning (LiSP), an algorithmic framework for non-episodic lifelong RL based on planning in an abstract space of higher-order skills. We learn the skills in an unsupervised manner using intrinsic rewards and plan over the learned skills using a learned dynamics model. Moreover, our framework permits skill discovery even from offline data, thereby reducing the need for excessive real-world interactions. We demonstrate empirically that LiSP successfully enables long-horizon planning and learns agents that can avoid catastrophic failures even in challenging non-stationary and non-episodic environments derived from gridworld and MuJoCo benchmarks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源