In this paper, we address the problem of using visuo-tactile feedback for 6-DoF localization and 3D reconstruction of unknown in-hand objects. We propose FingerSLAM, a closed-loop factor graph-based pose estimator that combines local tactile sensing at finger-tip and global vision sensing from a wrist-mount camera. FingerSLAM is constructed with two constituent pose estimators: a multi-pass refined tactile-based pose estimator that captures movements from detailed local textures, and a single-pass vision-based pose estimator that predicts from a global view of the object. We also design a loop closure mechanism that actively matches current vision and tactile images to previously stored key-frames to reduce accumulated error. FingerSLAM incorporates the two sensing modalities of tactile and vision, as well as the loop closure mechanism with a factor graph-based optimization framework. Such a framework produces an optimized pose estimation solution that is more accurate than the standalone estimators. The estimated poses are then used to reconstruct the shape of the unknown object incrementally by stitching the local point clouds recovered from tactile images. We train our system on real-world data collected with 20 objects. We demonstrate reliable visuo-tactile pose estimation and shape reconstruction through quantitative and qualitative real-world evaluations on 6 objects that are unseen during training.
翻译:在本文中, 我们处理如何使用相对活性反馈来进行6- DoF 本地化和3D 未知手表对象重建的问题。 我们提议使用FingerSLAM, 一个基于闭路因子图形的闭路因子图形显示显示显示器, 将局部手指触摸感和手表相机的全球视觉感测器结合起来。 FingerSLAM 建于两个构成显示器的测深器: 一个多角度精细细微的触动定位显示器, 从详细的本地纹理中捕捉运动, 以及一个单方视镜图像显示器显示器, 从该对象的全球视图中预测。 我们还设计一个循环关闭机制, 将当前视觉和触动图像与先前存储的关键框架相匹配, 以减少累积的错误。 FingSLAM 包含两种感知触觉和视觉的感测模式, 以及带有要素图形优化框架的环闭路闭路机制。 这样一个框架产生一种最优化的图像估计解决方案, 比独立估测器更精确。 估计器用于从该对象的全球视图中进行预测。 我们还设计了一个循环关闭了当前视觉图像的模拟,, 重建了20个不为我们所收集的轨道的轨道的轨道图像, 。</s>