针对目标条件的强化学习，弱监督的分解表示

论文标题

针对目标条件的强化学习，弱监督的分解表示

Weakly Supervised Disentangled Representation for Goal-conditioned Reinforcement Learning

论文作者

Qian, Zhifeng, You, Mingyu, Zhou, Hongjun, He, Bin

论文摘要

目标条件的强化学习是一种至关重要但充满挑战的算法，它使代理在动态环境中学习一组技能时能够实现多个用户指定的目标。但是，通常需要数百万个代理探索的环境相互作用，这是样本中的。在本文中，我们提出了一个技能学习框架DR-GRL DR，旨在通过组合脱离的表示形式学习和目标条件的视觉增强学习来提高样本效率和政策概括。我们以弱监督的方式提出了一个空间变换自动编码器（STAE），以学习可解释且可控制的表示，其中不同的部分与不同的对象属性相对应（形状，颜色，位置）。由于表示形式的高度可控性，STAE可以简单地重组并重新编码表示形式，从而为代理人练习自己的目标。学习表示的流形结构与物理位置保持一致，这对奖励计算有益。我们从经验上证明，Dr-Grl在样本效率和政策概括方面显着优于先前的方法。此外，Dr-Grl也很容易扩展到真正的机器人。

Goal-conditioned reinforcement learning is a crucial yet challenging algorithm which enables agents to achieve multiple user-specified goals when learning a set of skills in a dynamic environment. However, it typically requires millions of the environmental interactions explored by agents, which is sample-inefficient. In the paper, we propose a skill learning framework DR-GRL that aims to improve the sample efficiency and policy generalization by combining the Disentangled Representation learning and Goal-conditioned visual Reinforcement Learning. In a weakly supervised manner, we propose a Spatial Transform AutoEncoder (STAE) to learn an interpretable and controllable representation in which different parts correspond to different object attributes (shape, color, position). Due to the high controllability of the representations, STAE can simply recombine and recode the representations to generate unseen goals for agents to practice themselves. The manifold structure of the learned representation maintains consistency with the physical position, which is beneficial for reward calculation. We empirically demonstrate that DR-GRL significantly outperforms the previous methods in sample efficiency and policy generalization. In addition, DR-GRL is also easy to expand to the real robot.

下载PDF全文

下载文献需遵守相关版权规定

论文标题