Eye-in-hand camera calibration is a fundamental and long-studied problem in robotics. We present a study on using learning-based methods for solving this problem online from a single RGB image, whilst training our models with entirely synthetic data. We study three main approaches: one direct regression model that directly predicts the extrinsic matrix from an image, one sparse correspondence model that regresses 2D keypoints and then uses PnP, and one dense correspondence model that uses regressed depth and segmentation maps to enable ICP pose estimation. In our experiments, we benchmark these methods against each other and against well-established classical methods, to find the surprising result that direct regression outperforms other approaches, and we perform noise-sensitivity analysis to gain further insights into these results.
翻译:直视相机校准是机器人中长期研究的一个根本性问题。 我们用一个 RGB 图像来展示关于使用基于学习的方法在网上解决这一问题的研究,同时用完全合成的数据来培训我们的模型。 我们研究三个主要方法:一个直接回归模型,直接从图像中预测外形矩阵,一个稀疏的通信模型,从2D 关键点后反射,然后使用 PnP, 以及一个密集的通信模型,使用回溯深度和分割图来进行比较方案的估计。 在我们的实验中,我们将这些方法相互对照,并参照既定的经典方法,以找到直接回归优于其他方法的惊人结果,我们进行噪音敏感性分析,以深入了解这些结果。