We introduce Language-Informed Latent Actions (LILA), a framework for learning natural language interfaces in the context of human-robot collaboration. LILA falls under the shared autonomy paradigm: in addition to providing discrete language inputs, humans are given a low-dimensional controller $-$ e.g., a 2 degree-of-freedom (DoF) joystick that can move left/right and up/down $-$ for operating the robot. LILA learns to use language to modulate this controller, providing users with a language-informed control space: given an instruction like "place the cereal bowl on the tray," LILA may learn a 2-DoF space where one dimension controls the distance from the robot's end-effector to the bowl, and the other dimension controls the robot's end-effector pose relative to the grasp point on the bowl. We evaluate LILA with real-world user studies, where users can provide a language instruction while operating a 7-DoF Franka Emika Panda Arm to complete a series of complex manipulation tasks. We show that LILA models are not only more sample efficient and performant than imitation learning and end-effector control baselines, but that they are also qualitatively preferred by users.
翻译:我们引入了语言化中继动作(LILA),这是在人与机器人合作的背景下学习自然语言界面的框架。 LILA属于共享自主范式:除了提供离散语言投入外,还给人类一个低维控制器($-美元),例如,一个2度自由操纵器(DoF),可以移动左/右和上/下方的机器人操作。 LILA学会使用语言来调节该控制器,为用户提供语言控制空间:给用户一个“将谷物碗放在盘子上”这样的指令,LILA可以学习一个2度空间,其中一个维度控制机器人与碗的终端效应的距离,另一个维度控制机器人的终端效应与碗的掌握点相对。我们用真实世界用户研究来评价LILA,用户可以提供语言教学,同时操作一个7-DoF Franka Enika Panda Arm 来完成一系列复杂的终端操作任务。我们显示,LILA模型不仅具有质量效果,而且比模拟用户更能进行质量控制。