通过任务一致性损失在现实世界中的实用模仿学习

论文标题

通过任务一致性损失在现实世界中的实用模仿学习

Practical Imitation Learning in the Real World via Task Consistency Loss

论文作者

Khansari, Mohi, Ho, Daniel, Du, Yuqing, Fuentes, Armando, Bennice, Matthew, Sievers, Nicolas, Kirmani, Sean, Bai, Yunfei, Jang, Eric

论文摘要

机器人技术视觉端到端学习的最新工作表明，跨多种任务模仿学习的希望。这种方法之所以昂贵，是因为它们需要大量的现实世界培训演示，并且因为确定在现实世界中部署的最佳模型需要耗时的现实世界评估。可以通过模拟来缓解这些挑战：通过用模拟演示补充现实世界数据并使用模拟评估来识别高性能策略。但是，这引入了众所周知的“现实差距”问题，在该问题中，模拟器不准确地在现实中的模拟中表现出色。在本文中，我们基于基于GAN的域适应性的先前工作，并介绍了任务一致性损失（TCL）的概念，这是一种自我监督的损失，鼓励SIM卡和在功能预测水平上进行SIM和真实对齐。我们通过教授移动操纵器自主接近门，转动手柄以打开门并进入房间来证明我们的方法的有效性。该策略从RGB和深度图像进行控制，并将其推广到训练数据中未遇到的门。在SIM和Real中，我们仅使用约16.2个小时的遥控示范，在16个看不见的场景中取得了72％的成功。据我们所知，这是解决纯粹的端到端学习方法开门的第一项工作，在这种方法中，导航和操纵任务是由单个神经网络共同建模的。

Recent work in visual end-to-end learning for robotics has shown the promise of imitation learning across a variety of tasks. Such approaches are expensive both because they require large amounts of real world training demonstrations and because identifying the best model to deploy in the real world requires time-consuming real-world evaluations. These challenges can be mitigated by simulation: by supplementing real world data with simulated demonstrations and using simulated evaluations to identify high performing policies. However, this introduces the well-known "reality gap" problem, where simulator inaccuracies decorrelate performance in simulation from that of reality. In this paper, we build on top of prior work in GAN-based domain adaptation and introduce the notion of a Task Consistency Loss (TCL), a self-supervised loss that encourages sim and real alignment both at the feature and action-prediction levels. We demonstrate the effectiveness of our approach by teaching a mobile manipulator to autonomously approach a door, turn the handle to open the door, and enter the room. The policy performs control from RGB and depth images and generalizes to doors not encountered in training data. We achieve 72% success across sixteen seen and unseen scenes using only ~16.2 hours of teleoperated demonstrations in sim and real. To the best of our knowledge, this is the first work to tackle latched door opening from a purely end-to-end learning approach, where the task of navigation and manipulation are jointly modeled by a single neural network.

下载PDF全文

下载文献需遵守相关版权规定

论文标题