We introduce an approach that accurately reconstructs 3D human poses and detailed 3D full-body geometric models from single images in realtime. The key idea of our approach is a novel end-to-end multi-task deep learning framework that uses single images to predict five outputs simultaneously: foreground segmentation mask, 2D joints positions, semantic body partitions, 3D part orientations and uv coordinates (uv map). The multi-task network architecture not only generates more visual cues for reconstruction, but also makes each individual prediction more accurate. The CNN regressor is further combined with an optimization based algorithm for accurate kinematic pose reconstruction and full-body shape modeling. We show that the realtime reconstruction reaches accurate fitting that has not been seen before, especially for wild images. We demonstrate the results of our realtime 3D pose and human body reconstruction system on various challenging in-the-wild videos. We show the system advances the frontier of 3D human body and pose reconstruction from single images by quantitative evaluations and comparisons with state-of-the-art methods.
翻译:我们引入了一种精确重建 3D 人的外形和详细的 3D 全体全体几何模型的方法。 我们的方法的关键理念是一个全新的端到端多任务深层次学习框架,它使用单一图像同时预测5个输出: 前景分割面罩、 2D 联合位置、 语义身体分割、 3D 部分方向 和 uv 坐标( 下方地图 ) 。 多任务网络架构不仅为重建带来更多的视觉提示, 而且还使每个人的预测更加准确。 CNN 反射器与基于优化的算法进一步结合起来, 用于准确的动态面貌重建以及全体形状建模。 我们显示实时重建达到了以前从未见过的准确性, 特别是野生图像。 我们展示了我们实时的 3D 外形和人体重建系统在各种充满挑战的维氏视频上的结果。 我们展示了这个系统在3D 人类身体的前沿, 并通过量化评估和与状态方法的比较从单一图像进行重建。