It is highly desirable for robots that work alongside humans to be able to understand instructions in natural language. Existing language conditioned imitation learning models directly predict the actuator commands from the image observation and the instruction text. Rather than directly predicting actuator commands, we propose translating the natural language instruction to a Python function which queries the scene by accessing the output of the object detector and controls the robot to perform the specified task. This enables the use of non-differentiable modules such as a constraint solver when computing commands to the robot. Moreover, the labels in this setup are significantly more informative computer programs that capture the intent of the expert rather than teleoperated demonstrations. We show that the proposed method performs better than training a neural network to directly predict the robot actions.
翻译:与人类一起工作的机器人非常希望能够理解自然语言的指令。 现有的有条件的模拟学习模型直接从图像观察和指令文本中预测动画指令。 我们提议将自然语言指令转换为Python 函数,该函数通过访问天体探测器的输出来查询现场,并控制机器人执行指定任务。 这样可以使用不可区分的模块, 如在计算机器人指令时使用约束解答器。 此外,这个设置中的标签是信息性强得多的计算机程序,能够捕捉专家的意图,而不是远程操作演示。 我们显示,拟议的方法比训练神经网络来直接预测机器人动作要好。