Estimating 3D hand and object pose from a single image is an extremely challenging problem: hands and objects are often self-occluded during interactions, and the 3D annotations are scarce as even humans cannot directly label the ground-truths from a single image perfectly. To tackle these challenges, we propose a unified framework for estimating the 3D hand and object poses with semi-supervised learning. We build a joint learning framework where we perform explicit contextual reasoning between hand and object representations by a Transformer. Going beyond limited 3D annotations in a single image, we leverage the spatial-temporal consistency in large-scale hand-object videos as a constraint for generating pseudo labels in semi-supervised learning. Our method not only improves hand pose estimation in challenging real-world dataset, but also substantially improve the object pose which has fewer ground-truths per instance. By training with large-scale diverse videos, our model also generalizes better across multiple out-of-domain datasets. Project page and code: https://stevenlsw.github.io/Semi-Hand-Object


翻译:从单一图像中估算 3D 手和对象的形状是一个极具挑战性的问题:在互动期间,手和对象往往自我封闭,而3D 说明则很稀少,因为人类甚至不能从单一图像中直接直接标出地面真相。为了应对这些挑战,我们提议了一个统一框架,用半监督的学习来估计3D 手和对象的形状。我们建立了一个联合学习框架,在这个框架中,我们用一个变形器对手和对象的表示方式进行明确的背景推理。超越了在单一图像中有限的3D 说明,我们利用大型手拍视频中的空间时空一致性作为在半超级学习中生成假标签的制约。我们的方法不仅改进了挑战性真实世界数据集中的手势估计,而且还大大改进了物体的外观。我们用大型不同视频进行的培训,我们的模型还把多个外向数据集中的模型都更好归纳。项目页和代码: https://stevenlsw.github.io/Semi-Hand-Object:

0
下载
关闭预览

相关内容

Transferring Knowledge across Learning Processes
CreateAMind
28+阅读 · 2019年5月18日
强化学习的Unsupervised Meta-Learning
CreateAMind
17+阅读 · 2019年1月7日
Unsupervised Learning via Meta-Learning
CreateAMind
42+阅读 · 2019年1月3日
Single-Shot Object Detection with Enriched Semantics
统计学习与视觉计算组
14+阅读 · 2018年8月29日
【推荐】深度学习目标检测概览
机器学习研究会
10+阅读 · 2017年9月1日
DPOD: Dense 6D Pose Object Detector in RGB images
Arxiv
5+阅读 · 2019年2月28日
VIP会员
相关VIP内容
Top
微信扫码咨询专知VIP会员