Deep learning-based 3D human pose estimation performs best when trained on large amounts of labeled data, making combined learning from many datasets an important research direction. One obstacle to this endeavor are the different skeleton formats provided by different datasets, i.e., they do not label the same set of anatomical landmarks. There is little prior research on how to best supervise one model with such discrepant labels. We show that simply using separate output heads for different skeletons results in inconsistent depth estimates and insufficient information sharing across skeletons. As a remedy, we propose a novel affine-combining autoencoder (ACAE) method to perform dimensionality reduction on the number of landmarks. The discovered latent 3D points capture the redundancy among skeletons, enabling enhanced information sharing when used for consistency regularization. Our approach scales to an extreme multi-dataset regime, where we use 28 3D human pose datasets to supervise one model, which outperforms prior work on a range of benchmarks, including the challenging 3D Poses in the Wild (3DPW) dataset. Our code and models are available for research purposes.
翻译:深层的基于学习的 3D 人类的外表估计,如果在大量标签数据方面受过培训,则在从许多数据集中综合学习成为一个重要的研究方向时,表现得最优,这项工作的一个障碍是不同数据集提供的不同骨架格式,即它们不贴上相同的解剖标志。对于如何以这种不一致的标签对一个模型进行最佳监督,我们很少进行先前的研究。我们表明,仅仅为不同骨骼使用不同的输出头,就会产生不一致的深度估计,而且各骨骼之间的信息共享不足。作为一种补救措施,我们提议了一种新的亲子合成自动编码(ACAE)方法,以实施地标数的维度减少。发现的潜潜型 3D 显示骨架的冗余性,在用于一致性正规化时能够加强信息共享。我们对于一个极端多数据集的衡量尺度,我们用28 3D 人类的外壳数据集来监督一个模型,该模型比以前关于一系列基准(包括野(3DPW) 数据集中具有挑战性的3D Poseseses) 。我们的代码和模型可用于研究目的。