A robot operating in a household makes observations of multiple objects as it moves around over the course of days or weeks. The objects may be moved by inhabitants, but not completely at random. The robot may be called upon later to retrieve objects and will need a long-term object-based memory in order to know how to find them. Existing work in semantic slam does not attempt to capture the dynamics of object movement. In this paper, we combine some aspects of classic techniques for data-association filtering with modern attention-based neural networks to construct object-based memory systems that operate on high-dimensional observations and hypotheses. We perform end-to-end learning on labeled observation trajectories to learn both the transition and observation models. We demonstrate the system's effectiveness in maintaining memory of dynamically changing objects in both simulated environment and real images, and demonstrate improvements over classical structured approaches as well as unstructured neural approaches. Additional information available at project website: https://yilundu.github.io/obm/.
翻译:在一个家庭运行的机器人在数天或数周的时间里对多个物体进行观察。 物体可能由居民移动, 但不是完全随机移动。 机器人可能稍后被召唤去检索物体, 需要长期的物体记忆才能找到它们。 语义符号中的现有工作并不试图捕捉物体运动的动态。 在本文中, 我们把数据连接过滤的经典技术的某些方面与现代关注神经网络结合起来, 以构建基于物体的内存系统, 以在高维观测和假设上运行。 我们在标签的观察轨迹上进行端对端学习, 以学习过渡和观察模式。 我们展示这个系统在维持模拟环境和真实图像中动态变化物体的记忆方面的有效性, 并展示对经典结构化方法和无结构化神经方法的改进。 其他信息可在项目网站 https://yilundu.github.io/obm/ 上查阅 。