In this paper, we address the problem of estimating the in-hand 6D pose of an object in contact with multiple vision-based tactile sensors. We reason on the possible spatial configurations of the sensors along the object surface. Specifically, we filter contact hypotheses using geometric reasoning and a Convolutional Neural Network (CNN), trained on simulated object-agnostic images, to promote those that better comply with the actual tactile images from the sensors. We use the selected sensors configurations to optimize over the space of 6D poses using a Gradient Descent-based approach. We finally rank the obtained poses by penalizing those that are in collision with the sensors. We carry out experiments in simulation using the DIGIT vision-based sensor with several objects, from the standard YCB model set. The results demonstrate that our approach estimates object poses that are compatible with actual object-sensor contacts in $87.5\%$ of cases while reaching an average positional error in the order of $2$ centimeters. Our analysis also includes qualitative results of experiments with a real DIGIT sensor.
翻译:在本文中,我们处理估算与多视感应器接触的物体在手表6D外形的问题。我们根据传感器在物体表面的空间配置进行解释。具体地说,我们利用几何推理和进化神经网络过滤接触假说,这些假说是经过模拟物体-不可知图像培训的,目的是促进那些更符合传感器实际触动图像的物体。我们使用选定的传感器配置,利用梯度潜意识方法优化6D外形的空间。我们最后通过惩罚与传感器碰撞的传感器对获得的外形进行排序。我们从标准的YCB模型中用基于DIGIT的视觉传感器和若干物体进行模拟试验。结果表明,我们的方法估计对象的外形与实际物体-感应器接触相容,在87.5 美元的情况下达到平均位置错误的87.5 美分,同时达到平均位置误差约2美元。我们的分析还包括与实际DIGIT传感器进行试验的质量结果。