To obtain 3D annotations, we are restricted to controlled environments or synthetic datasets, leading us to 3D datasets with less generalizability to real-world scenarios. To tackle this issue in the context of semi-supervised 3D hand shape and pose estimation, we propose the Pose Alignment network to propagate 3D annotations from labelled frames to nearby unlabelled frames in sparsely annotated videos. We show that incorporating the alignment supervision on pairs of labelled-unlabelled frames allows us to improve the pose estimation accuracy. Besides, we show that the proposed Pose Alignment network can effectively propagate annotations on unseen sparsely labelled videos without fine-tuning.
翻译:为了获得 3D 说明,我们仅限于受控环境或合成数据集,导致我们获得3D数据集,对于现实世界的情景来说不那么普遍。为了在半监督的 3D 手形和估计背景下解决这一问题,我们建议Pose 匹配网络将3D 说明从贴标签的框中传播到附近的无标签的框中,以鲜为注解的视频。我们显示,将标签的无标签框架的对配对的校准监督纳入进来,可以使我们提高估计的准确性。此外,我们显示,拟议的Pose 匹配网络可以在不作微调的情况下有效传播隐微标签的视频的注解。