通过单发构图子概括新任务

论文标题

通过单发构图子概括新任务

Generalizing to New Tasks via One-Shot Compositional Subgoals

论文作者

Bian, Xihan, Mendez, Oscar, Hadfield, Simon

论文摘要

在现代机器学习研究中，概括到以前看不见的任务的能力是一个关键的挑战。它也是未来“将军AI”的基石。任何部署在现实世界应用中的人为智能的代理都必须随时适应未知环境。研究人员通常会依靠加强和模仿学习，通过反复学习在线适应新任务。但是，这对于需要许多时间段或大量子任务才能完成的复杂任务可能具有挑战性。这些“漫长的地平线”任务遭受样本效率低下的影响，并且在代理商可以学习执行必要的长期计划之前，可能需要非常长的培训时间。在这项工作中，我们介绍了案例，该案例试图通过使用适应性“不久的将来”子目标训练模仿学习代理来解决这些问题。这些子观念在每个步骤中使用构图算术在学习潜在的表示空间中进行重新计算。除了提高标准长期任务的学习效率外，这种方法还可以使对以前看不见的任务进行一次性概括，仅在不同环境中为该任务提供了一个参考轨迹。我们的实验表明，所提出的方法始终优于先前的最新成分模仿学习方法30％。

The ability to generalize to previously unseen tasks with little to no supervision is a key challenge in modern machine learning research. It is also a cornerstone of a future "General AI". Any artificially intelligent agent deployed in a real world application, must adapt on the fly to unknown environments. Researchers often rely on reinforcement and imitation learning to provide online adaptation to new tasks, through trial and error learning. However, this can be challenging for complex tasks which require many timesteps or large numbers of subtasks to complete. These "long horizon" tasks suffer from sample inefficiency and can require extremely long training times before the agent can learn to perform the necessary longterm planning. In this work, we introduce CASE which attempts to address these issues by training an Imitation Learning agent using adaptive "near future" subgoals. These subgoals are recalculated at each step using compositional arithmetic in a learned latent representation space. In addition to improving learning efficiency for standard long-term tasks, this approach also makes it possible to perform one-shot generalization to previously unseen tasks, given only a single reference trajectory for the task in a different environment. Our experiments show that the proposed approach consistently outperforms the previous state-of-the-art compositional Imitation Learning approach by 30%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题