We propose an unsupervised vision-based system to estimate the joint configurations of the robot arm from a sequence of RGB or RGB-D images without knowing the model a priori, and then adapt it to the task of category-independent articulated object pose estimation. We combine a classical geometric formulation with deep learning and extend the use of epipolar constraint to multi-rigid-body systems to solve this task. Given a video sequence, the optical flow is estimated to get the pixel-wise dense correspondences. After that, the 6D pose is computed by a modified PnP algorithm. The key idea is to leverage the geometric constraints and the constraint between multiple frames. Furthermore, we build a synthetic dataset with different kinds of robots and multi-joint articulated objects for the research of vision-based robot control and robotic vision. We demonstrate the effectiveness of our method on three benchmark datasets and show that our method achieves higher accuracy than the state-of-the-art supervised methods in estimating joint angles of robot arms and articulated objects.
翻译:我们提出一个不受监督的视觉定位系统,从一个 RGB 或 RGB-D 图像序列中估算机器人臂的联合配置,而没有先验的模型,然后将其调整到与类别独立的显形天体的任务中。我们将古典几何配方与深思熟虑结合起来,并将上皮极约束的使用扩大到多硬体系统,以完成这项任务。根据视频序列,光学流估计可以得到像素智慧密度的通信。此后,6D 形由修改的 PnP 算法计算。关键的想法是利用几何限制和多框架之间的制约。此外,我们用不同类型机器人和多协作的显形天体建立一个合成数据集,用于基于视觉的机器人控制和机器人视野的研究。我们展示了我们在三个基准数据集上的方法的有效性,并表明我们的方法在估计机器人武器和直径的共角度方面比最先进的监督方法更精确。