Estimating the 6D pose of textureless objects from RGB images is an important problem in robotics. Due to appearance ambiguities, rotational symmetries, and severe occlusions, single-view based 6D pose estimators are still unable to handle a wide range of objects, motivating research towards multi-view pose estimation and next-best-view prediction that addresses these limitations. In this work, we propose a comprehensive active perception framework for estimating the 6D poses of textureless objects using only RGB images. Our approach is built upon a key idea: decoupling the 6D pose estimation into a two-step sequential process can greatly improve both accuracy and efficiency. First, we estimate the 3D translation of each object, resolving scale and depth ambiguities inherent to RGB images. These estimates are then used to simplify the subsequent task of determining the 3D orientation, which we achieve through canonical scale template matching. Building on this formulation, we then introduce an active perception strategy that predicts the next best camera viewpoint to capture an RGB image, effectively reducing object pose uncertainty and enhancing pose accuracy. We evaluate our method on the public ROBI and TOD datasets, as well as on our reconstructed transparent object dataset, T-ROBI. Under the same camera viewpoints, our multi-view pose estimation significantly outperforms state-of-the-art approaches. Furthermore, by leveraging our next-best-view strategy, our approach achieves high pose accuracy with fewer viewpoints than heuristic-based policies across all evaluated datasets. The accompanying video and T-ROBI dataset will be released on our project page: https://trailab.github.io/ActiveODPE.
翻译:从RGB图像中估计无纹理物体的六维位姿是机器人学中的一个重要问题。由于外观模糊性、旋转对称性以及严重遮挡,基于单视角的六维位姿估计器仍难以处理广泛物体,这推动了针对多视角位姿估计和下一最佳视角预测的研究以解决这些局限。在本工作中,我们提出了一种全面的主动感知框架,仅使用RGB图像来估计无纹理物体的六维位姿。我们的方法基于一个核心思想:将六维位姿估计解耦为两步顺序过程可以显著提高准确性和效率。首先,我们估计每个物体的三维平移,解决RGB图像固有的尺度和深度模糊性。这些估计随后用于简化后续确定三维方向的任务,我们通过规范尺度模板匹配实现。基于此框架,我们进一步引入一种主动感知策略,预测捕获RGB图像的下一最佳相机视角,有效降低物体位姿不确定性并提升位姿精度。我们在公开的ROBI和TOD数据集以及我们重建的透明物体数据集T-ROBI上评估了我们的方法。在相同相机视角下,我们的多视角位姿估计显著优于现有先进方法。此外,通过利用我们的下一最佳视角策略,我们的方法在所有评估数据集上以比基于启发式策略更少的视角实现了高位姿精度。附带视频和T-ROBI数据集将在我们的项目页面发布:https://trailab.github.io/ActiveODPE。