3D hand-object pose estimation is an important issue to understand the interaction between human and environment. Current hand-object pose estimation methods require detailed 3D labels, which are expensive and labor-intensive. To tackle the problem of data collection, we propose a semi-supervised 3D hand-object pose estimation method with two key techniques: pose dictionary learning and an object-oriented coordinate system. The proposed pose dictionary learning module can distinguish infeasible poses by reconstruction error, enabling unlabeled data to provide supervision signals. The proposed object-oriented coordinate system can make 3D estimations equivariant to the camera perspective. Experiments are conducted on FPHA and HO-3D datasets. Our method reduces estimation error by 19.5% / 24.9% for hands/objects compared to straightforward use of labeled data on FPHA and outperforms several baseline methods. Extensive experiments also validate the robustness of the proposed method.
翻译:3D 手球显示估计是了解人与环境之间相互作用的一个重要问题。 手球显示估计方法目前的手球显示估计方法需要详细的 3D 标签,这些标签费用昂贵,而且耗费大量人力。 要解决数据收集问题,我们建议采用半监督的 3D 手球显示估计方法,使用两种关键技术: 显示字典学习和面向目标的协调系统。 拟议的字典学习模块可以区分重建错误造成的不可行因素, 使未标记的数据能够提供监督信号。 拟议的对象导向协调系统可以使3D 估计与相机的视角等同。 FPHA 和 HO-3D 数据集上进行了实验。 我们的方法将手球/对象的估计误差减少19.5%/ 24.9%,而直接使用FPHA 上标记的数据则超出若干基线方法。 广泛的实验还验证了拟议方法的稳健性。