The ability to perceive 3D human bodies from a single image has a multitude of applications ranging from entertainment and robotics to neuroscience and healthcare. A fundamental challenge in human mesh recovery is in collecting the ground truth 3D mesh targets required for training, which requires burdensome motion capturing systems and is often limited to indoor laboratories. As a result, while progress is made on benchmark datasets collected in these restrictive settings, models fail to generalize to real-world ``in-the-wild'' scenarios due to distribution shifts. We propose Domain Adaptive 3D Pose Augmentation (DAPA), a data augmentation method that enhances the model's generalization ability in in-the-wild scenarios. DAPA combines the strength of methods based on synthetic datasets by getting direct supervision from the synthesized meshes, and domain adaptation methods by using ground truth 2D keypoints from the target dataset. We show quantitatively that finetuning with DAPA effectively improves results on benchmarks 3DPW and AGORA. We further demonstrate the utility of DAPA on a challenging dataset curated from videos of real-world parent-child interaction.
翻译:从单一图像看3D人体的能力具有从娱乐和机器人到神经科学和保健等多种应用。人类网状恢复的一个根本挑战是收集培训所需的地面真象3D网目目标,这需要繁琐的动作捕捉系统,而且往往局限于室内实验室。因此,虽然在这些限制性环境中收集的基准数据集上取得了进展,但模型未能推广到真实世界的“在世”情景,因为分布变化。我们提议了DAPA(DAPA)适应性3D套花样放大(DAPA)(DAPA)(DAPA)(DAPA)(DAPA)(DAPA)(DAPA),这是一种数据增强方法,可以提高模型在野外情景中的一般化能力。DAPA(DAPA)将基于合成数据集的方法的力量结合起来,从合成的模件中直接监督,并通过利用目标数据集的地面真象 2D 关键点进行域适应方法。我们从数量上表明,与DAPA的微调有效地改进了基准3DPW和AGORA(A)的结果。我们进一步展示DAPA(DA)在从真实世界父母-childio-cide 互动的视频中具有挑战性的数据架上的数据架的效用。