The ability to perceive 3D human bodies from a single image has a multitude of applications ranging from entertainment and robotics to neuroscience and healthcare. A fundamental challenge in human mesh recovery is in collecting the ground truth 3D mesh targets required for training, which requires burdensome motion capturing systems and is often limited to indoor laboratories. As a result, while progress is made on benchmark datasets collected in these restrictive settings, models fail to generalize to real-world "in-the-wild" scenarios due to distribution shifts. We propose Domain Adaptive 3D Pose Augmentation (DAPA), a data augmentation method that enhances the model's generalization ability in in-the-wild scenarios. DAPA combines the strength of methods based on synthetic datasets by getting direct supervision from the synthesized meshes, and domain adaptation methods by using ground truth 2D keypoints from the target dataset. We show quantitatively that finetuning with DAPA effectively improves results on benchmarks 3DPW and AGORA. We further demonstrate the utility of DAPA on a challenging dataset curated from videos of real-world parent-child interaction.
翻译:从单一图像看3D人体的能力具有从娱乐和机器人到神经科学和保健等多种应用。人类网状恢复的一个根本挑战是收集培训所需的地面真象3D网目目标,这需要繁琐的动作捕捉系统,而且往往局限于室内实验室。因此,虽然在这些限制性环境中收集的基准数据集方面取得了进展,但模型未能由于分布变化而概括到真实世界的“瞬间”情景。我们提议了DAPA(DAPA),这是一种数据增强方法,可以加强模型在野外情景中的普及能力。DAPA通过直接从合成的网目中获得对合成数据集的直接监督,将基于合成数据集的方法的力度和领域适应方法结合起来,方法是利用目标数据集中的地面真象 2D 关键点。我们从数量上表明,与DAPA的微调有效地改进了基准3DPW和AGORA的结果。我们进一步展示了DAPA在从真实世界亲子互动的视频中具有挑战性的数据配置的效用。