Recent work in sim2real has successfully enabled robots to act in physical environments by training in simulation with a diverse ''population'' of environments (i.e. domain randomization). In this work, we focus on enabling generalization in assistive tasks: tasks in which the robot is acting to assist a user (e.g. helping someone with motor impairments with bathing or with scratching an itch). Such tasks are particularly interesting relative to prior sim2real successes because the environment now contains a human who is also acting. This complicates the problem because the diversity of human users (instead of merely physical environment parameters) is more difficult to capture in a population, thus increasing the likelihood of encountering out-of-distribution (OOD) human policies at test time. We advocate that generalization to such OOD policies benefits from (1) learning a good latent representation for human policies that test-time humans can accurately be mapped to, and (2) making that representation adaptable with test-time interaction data, instead of relying on it to perfectly capture the space of human policies based on the simulated population only. We study how to best learn such a representation by evaluating on purposefully constructed OOD test policies. We find that sim2real methods that encode environment (or population) parameters and work well in tasks that robots do in isolation, do not work well in assistance. In assistance, it seems crucial to train the representation based on the history of interaction directly, because that is what the robot will have access to at test time. Further, training these representations to then predict human actions not only gives them better structure, but also enables them to be fine-tuned at test-time, when the robot observes the partner act. https://adaptive-caregiver.github.io.
翻译:IM2Real最近的工作成功地使机器人能够在物理环境中行动,通过模拟环境的“人口”多样性(而不仅仅是物理环境参数)的培训,使机器人能够成功地在物理环境中行动。在这项工作中,我们侧重于在辅助性任务中进行普及化:机器人在帮助用户(例如帮助有运动障碍的人洗澡或抓痒痒)的任务中采取行动。这些任务与先前的模拟性成功相比特别有趣,因为环境现在包含一个也在采取行动的人。这使问题复杂化,因为人类用户的多样性(而不仅仅是物理环境参数)更难以在人群中捕捉到,从而增加在测试时间里遇到超出分配(OOOD)的人类政策的可能性。我们主张,对OOD政策的一般化的好处是:(1) 学习测试时人类政策的良好潜在代表性,可以精确地将测试时间绘制到测试时间数据,而不是依靠测试时的交互互动数据来完全地捕捉到人类政策的空间。我们研究如何在模拟人口模型里进行这种互动,我们研究如何在测试时间里学会这种代表性的模型里进行这样的演示, 并且测试这些伙伴的行为是用来进行精确的实验。