6D pose estimation of rigid objects from RGB-D images is crucial for object grasping and manipulation in robotics. Although RGB channels and the depth (D) channel are often complementary, providing respectively the appearance and geometry information, it is still non-trivial how to fully benefit from the two cross-modal data. From the simple yet new observation, when an object rotates, its semantic label is invariant to the pose while its keypoint offset direction is variant to the pose. To this end, we present SO(3)-Pose, a new representation learning network to explore SO(3)-equivariant and SO(3)-invariant features from the depth channel for pose estimation. The SO(3)-invariant features facilitate to learn more distinctive representations for segmenting objects with similar appearance from RGB channels. The SO(3)-equivariant features communicate with RGB features to deduce the (missed) geometry for detecting keypoints of an object with the reflective surface from the depth channel. Unlike most of existing pose estimation methods, our SO(3)-Pose not only implements the information communication between the RGB and depth channels, but also naturally absorbs the SO(3)-equivariance geometry knowledge from depth images, leading to better appearance and geometry representation learning. Comprehensive experiments show that our method achieves the state-of-the-art performance on three benchmarks.
翻译:6D 表示对 RGB- D 图像中的僵硬天体的估计对于机器人的物体捕捉和操纵至关重要。 虽然 RGB 频道和深度( D) 频道往往互为补充, 提供外观和几何信息, 但对于如何从两个交叉模式数据中充分获益, 仍然不是三重性。 从简单而新的观察来看, 当物体旋转时, 其语义标签与外观互异, 而其关键点抵消方向则与外观互换。 为此, 我们提出SO(3)- Pose, 一个新的代表学习网络, 以从深度频道探索 SO(3) 等异性和 SO(3) 异性特征。 SO(3) 异性特征有助于学习与RGB 频道相类似的截断物体的更独特的表达方式。 SO(3) 等异性特征与 RGB 特征沟通, 以推断( 缺失的) 用于从深层通道探测物体关键点和反射面的几度方向。 与大多数现有的估测算方法不同, 我们的SO(3)- Pose 不仅在深度频道和深度深度分析中进行信息交流, 从地理定位的深度分析, 并自然吸收了我们地理- 和深度图像的深度分析方法。