4D human sensing and modeling are fundamental tasks in vision and graphics with numerous applications. With the advances of new sensors and algorithms, there is an increasing demand for more versatile datasets. In this work, we contribute HuMMan, a large-scale multi-modal 4D human dataset with 1000 human subjects, 400k sequences and 60M frames. HuMMan has several appealing properties: 1) multi-modal data and annotations including color images, point clouds, keypoints, SMPL parameters, and textured meshes; 2) popular mobile device is included in the sensor suite; 3) a set of 500 actions, designed to cover fundamental movements; 4) multiple tasks such as action recognition, pose estimation, parametric human recovery, and textured mesh reconstruction are supported and evaluated. Extensive experiments on HuMMan voice the need for further study on challenges such as fine-grained action recognition, dynamic human mesh reconstruction, point cloud-based parametric human recovery, and cross-device domain gaps.
翻译:4D人类感测和建模是具有多种应用的视觉和图形的基本任务。随着新传感器和算法的进步,对更多多功能数据集的需求不断增加。在这项工作中,我们贡献了具有1000个人类主体、400公里序列和60M框架的大型多式4D人类数据集HuMMan。HuMMan具有若干具有吸引力的特性:1)多模式数据和说明,包括彩色图像、点云、关键点、SMPL参数和纹理 meshes;2)传感器套件中包括了受欢迎的移动装置;3)一套500项行动,旨在覆盖基本移动;4)支持和评价了多种任务,如行动识别、设定估计、参数人类恢复和纹理网格重建等。在HuMMan上的广泛实验表明,需要进一步研究各种挑战,例如精细的动作识别、动态人类网块重建、点云基准人类恢复和交叉偏差领域差距。