Human pose and shape estimation from RGB images is a highly sought after alternative to marker-based motion capture, which is laborious, requires expensive equipment, and constrains capture to laboratory environments. Monocular vision-based algorithms, however, still suffer from rotational ambiguities and are not ready for translation in healthcare applications, where high accuracy is paramount. While fusion of data from multiple viewpoints could overcome these challenges, current algorithms require further improvement to obtain clinically acceptable accuracies. In this paper, we propose a learnable volumetric aggregation approach to reconstruct 3D human body pose and shape from calibrated multi-view images. We use a parametric representation of the human body, which makes our approach directly applicable to medical applications. Compared to previous approaches, our framework shows higher accuracy and greater promise for real-time prediction, given its cost efficiency.
 翻译:从 RGB 图像中对人类的外形和形状进行估计是高度寻求的,以替代基于标记的动作捕捉,这是非常费力的,需要昂贵的设备,并会限制实验室环境的捕捉。 但是,单人视像算法仍然受到轮换的模糊不清的影响,无法在保健应用中进行翻译,因为高精确度是至高无上的。虽然从多种角度汇集数据可以克服这些挑战,但目前的算法需要进一步改进,以获得临床上可接受的灵敏度。在本文中,我们建议采用可学习的量子集法,从经过校准的多视图像中重建3D 人体的外形和形状。我们使用人体的参数表示法,这使得我们的方法直接适用于医疗应用。与以往的方法相比,我们的框架显示出更高的准确性和更高的实时预测前景,因为其成本效率很高。