通过对比的预训练和数据增强有效地学习视觉机器人控制

论文标题

通过对比的预训练和数据增强有效地学习视觉机器人控制

Learning Visual Robotic Control Efficiently with Contrastive Pre-training and Data Augmentation

论文作者

Zhan, Albert, Zhao, Ruihan, Pinto, Lerrel, Abbeel, Pieter, Laskin, Michael

论文摘要

无监督表示学习的最新进展显着提高了模拟环境中培训强化学习政策的样本效率。但是，尚未看到针对实体强化学习的类似收益。在这项工作中，我们专注于从像素中启用数据有效的实体机器人学习。我们提出了有效的机器人学习（编码器）的对比前训练和数据增强，该方法利用数据增强和无监督的学习来从稀疏奖励中实现对现实机器人ARM策略的样本效率培训。虽然对比的预训练，数据增强，演示和强化学习不足以进行有效学习，但我们的主要贡献表明，这些不同技术的组合导致了一种简单而数据效率的方法。我们表明，只有10个示范，一个机器人手臂可以从像素中学习稀疏的奖励操纵策略，例如到达，拾取，移动，拉动大物体，翻转开关并在仅30分钟的平均现实世界中训练时间内打开抽屉。我们在项目网站上包括视频和代码：https：//sites.google.com/view/felfficited-robotic-manipulation/home

Recent advances in unsupervised representation learning significantly improved the sample efficiency of training Reinforcement Learning policies in simulated environments. However, similar gains have not yet been seen for real-robot reinforcement learning. In this work, we focus on enabling data-efficient real-robot learning from pixels. We present Contrastive Pre-training and Data Augmentation for Efficient Robotic Learning (CoDER), a method that utilizes data augmentation and unsupervised learning to achieve sample-efficient training of real-robot arm policies from sparse rewards. While contrastive pre-training, data augmentation, demonstrations, and reinforcement learning are alone insufficient for efficient learning, our main contribution is showing that the combination of these disparate techniques results in a simple yet data-efficient method. We show that, given only 10 demonstrations, a single robotic arm can learn sparse-reward manipulation policies from pixels, such as reaching, picking, moving, pulling a large object, flipping a switch, and opening a drawer in just 30 minutes of mean real-world training time. We include videos and code on the project website: https://sites.google.com/view/efficient-robotic-manipulation/home

下载PDF全文

下载文献需遵守相关版权规定

论文标题