Robots assisting us in environments such as factories or homes must learn to make use of objects as tools to perform tasks, for instance using a tray to carry objects. We consider the problem of learning commonsense knowledge of when a tool may be useful and how its use may be composed with other tools to accomplish a high-level task instructed by a human. Specifically, we introduce a novel neural model, termed TOOLTANGO, that first predicts the next tool to be used, and then uses this information to predict the next action. We show that this joint model can inform learning of a fine-grained policy enabling the robot to use a particular tool in sequence and adds a significant value in making the model more accurate. TOOLTANGO encodes the world state, comprising objects and symbolic relationships between them, using a graph neural network and is trained using demonstrations from human teachers instructing a virtual robot in a physics simulator. The model learns to attend over the scene using knowledge of the goal and the action history, finally decoding the symbolic action to execute. Crucially, we address generalization to unseen environments where some known tools are missing, but alternative unseen tools are present. We show that by augmenting the representation of the environment with pre-trained embeddings derived from a knowledge-base, the model can generalize effectively to novel environments. Experimental results show at least 48.8-58.1% absolute improvement over the baselines in predicting successful symbolic plans for a simulated mobile manipulator in novel environments with unseen objects. This work takes a step in the direction of enabling robots to rapidly synthesize robust plans for complex tasks, particularly in novel settings
翻译:在工厂或家庭等环境中协助我们的机器人必须学会使用物体作为工具来执行任务,例如使用托盘来携带物体。我们考虑的问题是,学习普通知识,了解工具何时有用,以及如何用其他工具来组成工具,以完成人类指示的高级任务。具体地说,我们引入了一个新的神经模型,称为TOOLTANGO,先预测下一个工具,然后使用这一信息来预测下一步行动。我们显示,这一联合模型可以用来学习精细的政策,使机器人能够使用特定工具序列,并增加使模型更准确化的重大价值。TOOLTANGO用图形神经网络和人类教师演示来指导一个虚拟机器人,先用物理模拟器来预测下一个工具,然后使用这一模型来了解下一步行动,最后解码要执行的象征性行动。我们非常清楚的是,我们把一些已知的精确环境概括到最不精确的环境,使模型更精确的模型更精确地用于使模型更精确地建立更精确的模型。TOOLTNGO 将世界状态编码,包括物体和它们之间的象征性关系,使用图形神经网络,并用人类教师的演示演示演示,然后用预变的虚拟的模型, 来显示一个预的精确的精确的精确的模型,我们可以显示一个预的精确的精确的模型的模型, 显示一个预的精确的模型的模型的模型的模型的模型, 展示的预的模型可以显示一个预的预的模型的模型, 显示一个预的预的模型, 显示的预的预的模型可以显示的预的模型。