We're interested in the problem of estimating object states from touch during manipulation under occlusions. In this work, we address the problem of estimating object poses from touch during planar pushing. Vision-based tactile sensors provide rich, local image measurements at the point of contact. A single such measurement, however, contains limited information and multiple measurements are needed to infer latent object state. We solve this inference problem using a factor graph. In order to incorporate tactile measurements in the graph, we need local observation models that can map high-dimensional tactile images onto a low-dimensional state space. Prior work has used low-dimensional force measurements or engineered functions to interpret tactile measurements. These methods, however, can be brittle and difficult to scale across objects and sensors. Our key insight is to directly learn tactile observation models that predict the relative pose of the sensor given a pair of tactile images. These relative poses can then be incorporated as factors within a factor graph. We propose a two-stage approach: first we learn local tactile observation models supervised with ground truth data, and then integrate these models along with physics and geometric factors within a factor graph optimizer. We demonstrate reliable object tracking using only tactile feedback for 150 real-world planar pushing sequences with varying trajectories across three object shapes. Supplementary video: https://youtu.be/y1kBfSmi8w0
翻译:我们感兴趣的是天体的估算问题。 在这项工作中, 我们处理的是对天体在平板推力期间触摸时的感知变化进行估计的问题。 基于视觉的触觉传感器在接触点提供丰富的局部图像测量。 然而, 单个的测量包含有限的信息和多重测量, 以推断潜在天体状态。 我们用一个系数图来解决这个推论问题。 为了将触觉测量纳入图中, 我们需要本地观测模型, 可以将高维触动图像映射到一个低维状态空间。 先前的工作已经使用低维力测量或设计功能来解释触觉测量。 然而, 这些方法在接触点可以提供丰富的局部图像测量。 然而, 这些方法可能是模糊的, 并且很难在对象和传感器之间进行比例测量。 我们的关键洞察是直接学习触觉观察模型, 用来预测传感器的相对构成, 使用一组触觉图像。 然后这些相对的构成可以作为要素纳入一个系数图中。 我们提出一个两阶段的方法: 首先我们学习当地触觉观察模型, 与地面真相数据一起加以监督, 然后用我们用150个图像序列的跟踪模型, 将这些模型与一个精确的模型与一个精确的模型结合。