Predicting the future interaction of objects when they come into contact with their environment is key for autonomous agents to take intelligent and anticipatory actions. This paper presents a perception framework that fuses visual and tactile feedback to make predictions about the expected motion of objects in dynamic scenes. Visual information captures object properties such as 3D shape and location, while tactile information provides critical cues about interaction forces and resulting object motion when it makes contact with the environment. Utilizing a novel See-Through-your-Skin (STS) sensor that provides high resolution multimodal sensing of contact surfaces, our system captures both the visual appearance and the tactile properties of objects. We interpret the dual stream signals from the sensor using a Multimodal Variational Autoencoder (MVAE), allowing us to capture both modalities of contacting objects and to develop a mapping from visual to tactile interaction and vice-versa. Additionally, the perceptual system can be used to infer the outcome of future physical interactions, which we validate through simulated and real-world experiments in which the resting state of an object is predicted from given initial conditions.
翻译:预测天体在接触环境时的未来相互作用是自主物剂采取智能和预见性行动的关键。 本文展示了一个感知框架, 将视觉和触觉反馈结合到动态场景中对天体预期运动的预测中。 视觉信息捕捉了3D形状和位置等天体属性, 而触觉信息提供了互动力和随之而来的天体运动的重要提示, 当天体与环境接触时, 触觉信息提供了关键提示。 利用一个提供高分辨率接触表面多式感知的新型Seee- through- your-Skin(STS)传感器, 我们的系统捕捉了天体表面的视觉和触觉特性。 我们使用多式动动性自动电解器( MVAE) 来解读传感器的双流信号, 使我们能捕捉到与天体接触的方式, 并开发从视觉到触动相互作用和反向的天体运动的映射。 此外, 概念系统可以用来推断未来物理互动的结果, 我们通过模拟和现实世界实验来验证这些结果, 在这种实验中, 对象的状态从最初条件中预测。