As robots perform manipulation tasks and interact with objects, it is probable that they accidentally drop objects that subsequently bounce out of their visual fields (e.g., due to an inadequate grasp of an unfamiliar object). To enable robots to recover from such errors, we draw upon the concept of object permanence-objects remain in existence even when they are not being sensed (e.g., seen) directly. In particular, we developed a multimodal neural network model-using a partial, observed bounce trajectory and the audio resulting from drop impact as its inputs-to predict the full bounce trajectory and the end location of a dropped object. We empirically show that: (1) our multimodal method predicted end locations close in proximity (i.e., within the visual field of the robot's wrist camera) to the actual locations and (2) the robot was able to retrieve dropped objects by applying minimal vision-based pick-up adjustments. Additionally, we show that our method outperformed five comparison baselines in retrieving dropped objects.
翻译:作为机器人执行操纵任务并与物体互动,它们很可能不小心地将随后弹出其视觉外野的物体(例如,由于对不熟悉的物体的掌握不足)抛出(例如,由于对不熟悉的物体的掌握不足)。为了使机器人能够从这些错误中恢复过来,我们借鉴物体永久性物体的概念,即使它们没有被直接感应(例如,看到),我们仍然存在。特别是,我们开发了一个多式神经网络模型,使用部分、观测到的弹跳动轨迹和弹出弹着弹着弹着弹的音频,以预测其输入的全弹弹弹道和被击落物体的结束位置。我们的经验显示:(1) 我们的多式联运方法预测了接近实际位置(即,在机器人手腕照相机的视觉场内)的结束位置,以及(2) 机器人能够通过使用最低限度的视像基取款调整来检索被丢弃的物体。此外,我们显示我们的方法在检索被丢弃物体时超过了五个比较基线。