可以在辅助任务中概括的学习表征

论文标题

可以在辅助任务中概括的学习表征

Learning Representations that Enable Generalization in Assistive Tasks

论文作者

He, Jerry Zhi-Yang, Raghunathan, Aditi, Brown, Daniel S., Erickson, Zackory, Dragan, Anca D.

论文摘要

SIM2Real的最新工作已成功地使机器人通过在环境中具有多样化的“人口”（即域随机化）进行训练，可以在物理环境中起作用。在这项工作中，我们专注于在辅助任务中促进概括：机器人正在采取行动帮助用户的任务（例如，帮助有运动障碍的人洗澡或挠痒痒）。相对于先前的SIM2REAL成功，此类任务特别有趣，因为环境现在包含一个也在行动的人。这使问题复杂化，因为人类用户（而不是仅仅是物理环境参数）的多样性更难在人群中捕获，从而增加了在测试时遇到人类政策遇到的人类政策的可能性。我们主张对此类OOD政策的概括得益于（1）对人类政策的学习良好的潜在代表性，该政策可以准确地映射到测试时间的人类，以及（2）使该代表能够适应测试时间交互数据，而不是依靠它仅基于模拟人群才能完美地捕获人类策略的空间。我们研究如何通过评估有目的构建的OOD测试策略来最好地学习这种表示。我们发现编码环境（或人口）参数的SIM2REAL方法在机器人隔离执行的任务中很好地工作，在援助方面无法很好地工作。在协助下，直接基于互动历史来训练表示形式似乎至关重要，因为这是机器人在测试时可以访问的。此外，培训这些表示形式以预测人类的行为不仅为它们提供了更好的结构，还可以在机器人观察伙伴行为时在测试时间进行微调。 https://adaptive-caregiver.github.io。

Recent work in sim2real has successfully enabled robots to act in physical environments by training in simulation with a diverse ''population'' of environments (i.e. domain randomization). In this work, we focus on enabling generalization in assistive tasks: tasks in which the robot is acting to assist a user (e.g. helping someone with motor impairments with bathing or with scratching an itch). Such tasks are particularly interesting relative to prior sim2real successes because the environment now contains a human who is also acting. This complicates the problem because the diversity of human users (instead of merely physical environment parameters) is more difficult to capture in a population, thus increasing the likelihood of encountering out-of-distribution (OOD) human policies at test time. We advocate that generalization to such OOD policies benefits from (1) learning a good latent representation for human policies that test-time humans can accurately be mapped to, and (2) making that representation adaptable with test-time interaction data, instead of relying on it to perfectly capture the space of human policies based on the simulated population only. We study how to best learn such a representation by evaluating on purposefully constructed OOD test policies. We find that sim2real methods that encode environment (or population) parameters and work well in tasks that robots do in isolation, do not work well in assistance. In assistance, it seems crucial to train the representation based on the history of interaction directly, because that is what the robot will have access to at test time. Further, training these representations to then predict human actions not only gives them better structure, but also enables them to be fine-tuned at test-time, when the robot observes the partner act. https://adaptive-caregiver.github.io.

下载PDF全文

下载文献需遵守相关版权规定

论文标题