论文标题
向人类教练快速学习的多功能代理
A Versatile Agent for Fast Learning from Human Instructors
论文作者
论文摘要
近年来,由于机器学习的进步,已经完成了无数的有关智能机器人技术政策的最高级著作。但是,效率低下和缺乏转移能力阻碍了务实应用的算法,尤其是在人类机器人协作中,当少数快速学习和高灵活性成为一种努力时。为了克服这一障碍,我们指的是一个“政策池”,其中包含可以轻松访问和重复使用的预培训技能。通过以灵活的顺序展开必要的技能,采用代理来管理“政策池”,取决于特定于任务的偏爱。可以从一个或几个人类专家示范中自动解释这种偏好。在这个层次结构的环境下,我们的算法能够在迷你招架环境中获得一个稀疏的奖励,多阶段的诀窍,只有一个演示,表明有可能立即掌握人类教练的复杂机器人技能。此外,我们算法的先天质量还允许终身学习,使其成为多功能的代理。
In recent years, a myriad of superlative works on intelligent robotics policies have been done, thanks to advances in machine learning. However, inefficiency and lack of transfer ability hindered algorithms from pragmatic applications, especially in human-robot collaboration, when few-shot fast learning and high flexibility become a wherewithal. To surmount this obstacle, we refer to a "Policy Pool", containing pre-trained skills that can be easily accessed and reused. An agent is employed to govern the "Policy Pool" by unfolding requisite skills in a flexible sequence, contingent on task specific predilection. This predilection can be automatically interpreted from one or few human expert demonstrations. Under this hierarchical setting, our algorithm is able to pick up a sparse-reward, multi-stage knack with only one demonstration in a Mini-Grid environment, showing the potential for instantly mastering complex robotics skills from human instructors. Additionally, the innate quality of our algorithm also allows for lifelong learning, making it a versatile agent.