A robot working in a physical environment (like home or factory) needs to learn to use various available tools for accomplishing different tasks, for instance, a mop for cleaning and a tray for carrying objects. The number of possible tools is large and it may not be feasible to demonstrate usage of each individual tool during training. Can a robot learn commonsense knowledge and adapt to novel settings where some known tools are missing, but alternative unseen tools are present? We present a neural model that predicts the best tool from the available objects for achieving a given declarative goal. This model is trained by user demonstrations, which we crowd-source through humans instructing a robot in a physics simulator. This dataset maintains user plans involving multi-step object interactions along with symbolic state changes. Our neural model, ToolNet, combines a graph neural network to encode the current environment state, and goal-conditioned spatial attention to predict the appropriate tool. We find that providing metric and semantic properties of objects, and pre-trained object embeddings derived from a commonsense knowledge repository such as ConceptNet, significantly improves the model's ability to generalize to unseen tools. The model makes accurate and generalizable tool predictions. When compared to a graph neural network baseline, it achieves 14-27% accuracy improvement for predicting known tools from new world scenes, and 44-67% improvement in generalization for novel objects not encountered during training.
翻译:在一个物理环境中工作的机器人( 如家用或工厂) 需要学会使用各种可用的工具来完成不同的任务, 比如, 清洁的拖把和随身携带物体的托盘。 可能的工具数量很大, 在培训期间演示每个工具的使用可能不可行。 机器人能否学习常识知识并适应新环境, 缺少一些已知工具, 但有其他的看不见工具存在? 我们提出了一个神经模型, 预测从可用对象中实现特定宣示目标的最佳工具。 这个模型由用户演示来训练, 我们通过人类收集源来指导一个机器人在物理模拟器中。 这个数据集维持用户计划, 包括多步对象互动和象征性状态变化。 我们的神经模型, 工具网, 结合一个图形神经网络来编码当前环境状态, 以及有目标限制的空间关注来预测合适的工具。 我们发现, 提供了天体的度和语义特性, 以及从概念网等普通知识库中衍生的预先训练对象嵌入。 大大改进模型在模型中无法大大改进模型的多步天体物体与标志性网络的精确度, 将44级预测工具变为新的模型。 在一般的模型中, 将它变为新的模型中, 将精确地路路变为新工具。 。