End-to-end deep representation learning has achieved remarkable accuracy for monocular 3D human pose estimation, yet these models may fail for unseen poses with limited and fixed training data. This paper proposes a novel data augmentation method that: (1) is scalable for synthesizing massive amount of training data (over 8 million valid 3D human poses with corresponding 2D projections) for training 2D-to-3D networks, (2) can effectively reduce dataset bias. Our method evolves a limited dataset to synthesize unseen 3D human skeletons based on a hierarchical human representation and heuristics inspired by prior knowledge. Extensive experiments show that our approach not only achieves state-of-the-art accuracy on the largest public benchmark, but also generalizes significantly better to unseen and rare poses. Code, pre-trained models and tools are available at this HTTPS URL.
翻译:在单立方公尺3D人构成估计方面,端至端深层代表性学习取得了显著的准确性,然而,这些模型可能无法以有限和固定的培训数据作为无形的外形。本文提出一种新的数据增强方法,即:(1) 可用于综合培训2D-3D网络的大量培训数据(超过800万个有效的三维人构成及相应的2D预测),(2) 有效减少数据集偏差。我们的方法发展出一套有限的数据集,以基于人类等级代表制和先前知识启发的超常性合成不可见的三维人骨骼。广泛的实验表明,我们的方法不仅在最大的公共基准上达到最新准确度,而且还大大优于无形和稀有的外形。本 HTTPS URL 提供代码、预先培训的模型和工具。