从多模式演示中学习的动力学模仿学习

论文标题

从多模式演示中学习的动力学模仿学习

Out-of-Dynamics Imitation Learning from Multimodal Demonstrations

论文作者

Qiu, Yiwen, Wu, Jialong, Cao, Zhangjie, Long, Mingsheng

论文摘要

现有的模仿学习作品主要假设收集示范的演示者与模仿者具有相同的动态。但是，该假设限制了模仿学习的使用，尤其是在为模仿者收集演示时很难。在本文中，我们研究了动力学的模仿学习（OOD-IL），这使人们对演示者和模仿者具有相同的状态空间，但可能具有不同的动作空间和动态。 OOD-IL使模仿学习可以从广泛的示威者中使用演示，但引入了一个新的挑战：由于不同的动态，模仿者无法实现某些演示。先前的工作试图通过可行性测量来过滤此类演示，但忽略了示范表现出多模式分布的事实，因为不同的示威者可能会在不同的动力学中采取不同的策略。我们开发了更好的可传递性测量，以应对这一新出现的挑战。我们首先将基于新型的基于序列的对比聚类算法设计为从相同模式的聚类演示，以避免从不同模式中进行演示的相互干扰，然后在每个群集中使用基于对抗性学习的算法学习每个演示的可传递性。实验结果在几个Mujoco环境，一个驱动环境和模拟机器人环境上表明，提出的可传递性测量更准确地发现和下降量不可转移演示，并且在最终模仿学习绩效方面的先前作品都优于先前的作品。我们在网站上显示了我们的实验结果的视频。

Existing imitation learning works mainly assume that the demonstrator who collects demonstrations shares the same dynamics as the imitator. However, the assumption limits the usage of imitation learning, especially when collecting demonstrations for the imitator is difficult. In this paper, we study out-of-dynamics imitation learning (OOD-IL), which relaxes the assumption to that the demonstrator and the imitator have the same state spaces but could have different action spaces and dynamics. OOD-IL enables imitation learning to utilize demonstrations from a wide range of demonstrators but introduces a new challenge: some demonstrations cannot be achieved by the imitator due to the different dynamics. Prior works try to filter out such demonstrations by feasibility measurements, but ignore the fact that the demonstrations exhibit a multimodal distribution since the different demonstrators may take different policies in different dynamics. We develop a better transferability measurement to tackle this newly-emerged challenge. We firstly design a novel sequence-based contrastive clustering algorithm to cluster demonstrations from the same mode to avoid the mutual interference of demonstrations from different modes, and then learn the transferability of each demonstration with an adversarial-learning based algorithm in each cluster. Experiment results on several MuJoCo environments, a driving environment, and a simulated robot environment show that the proposed transferability measurement more accurately finds and down-weights non-transferable demonstrations and outperforms prior works on the final imitation learning performance. We show the videos of our experiment results on our website.

下载PDF全文

下载文献需遵守相关版权规定

论文标题