论文标题
原始的瓦斯林模仿学习
Primal Wasserstein Imitation Learning
论文作者
论文摘要
模仿学习(IL)方法试图将代理的行为与专家的行为相匹配。在目前的工作中,我们提出了一种基于概念上简单算法的新IL方法:原始的Wasserstein模仿学习(PWIL),该方法与专家与代理的州行动分布之间的Wasserstein距离的原始形式相关联。我们提供了一个离线奖励功能,而不是最近通过与环境的互动来学习奖励功能的近期对抗性IL算法,并且几乎不需要微调。我们表明,我们可以根据座位相互作用和与环境的专家相互作用,以样本有效的方式恢复有关穆约可可域的各种连续控制任务的专家行为。最后,我们表明,我们训练的代理商的行为与Wasserstein距离的专家的行为相匹配,而不是常用的性能代理。
Imitation Learning (IL) methods seek to match the behavior of an agent with that of an expert. In the present work, we propose a new IL method based on a conceptually simple algorithm: Primal Wasserstein Imitation Learning (PWIL), which ties to the primal form of the Wasserstein distance between the expert and the agent state-action distributions. We present a reward function which is derived offline, as opposed to recent adversarial IL algorithms that learn a reward function through interactions with the environment, and which requires little fine-tuning. We show that we can recover expert behavior on a variety of continuous control tasks of the MuJoCo domain in a sample efficient manner in terms of agent interactions and of expert interactions with the environment. Finally, we show that the behavior of the agent we train matches the behavior of the expert with the Wasserstein distance, rather than the commonly used proxy of performance.