We present a system for real-time RGBD-based estimation of 3D human pose. We use parametric 3D deformable human mesh model (SMPL-X) as a representation and focus on the real-time estimation of parameters for the body pose, hands pose and facial expression from Kinect Azure RGB-D camera. We train estimators of body pose and facial expression parameters. Both estimators use previously published landmark extractors as input and custom annotated datasets for supervision, while hand pose is estimated directly by a previously published method. We combine the predictions of those estimators into a temporally-smooth human pose. We train the facial expression extractor on a large talking face dataset, which we annotate with facial expression parameters. For the body pose we collect and annotate a dataset of 56 people captured from a rig of 5 Kinect Azure RGB-D cameras and use it together with a large motion capture AMASS dataset. Our RGB-D body pose model outperforms the state-of-the-art RGB-only methods and works on the same level of accuracy compared to a slower RGB-D optimization-based solution. The combined system runs at 30 FPS on a server with a single GPU. The code will be available at https://saic-violet.github.io/rgbd-kinect-pose
翻译:我们展示了一个基于 RGBD 的基于 3D 人的外表实时估计系统。 我们用3D 的变形人类网形模型( SMPL-X) 作为代表, 并侧重于Kinect Azure RGB- D 相机对人体外形、 手势和面部表达的参数的实时估计。 我们训练了对身体外形、 手势和脸部表达的显示器进行实时估计的系统。 我们训练了对身体外形、 手势和面部表达参数进行实时估计的3D 变形人类网形模型( SMPL- X ) 。 我们训练了对身体外形、 手势和面部表达参数进行实时估计的3D 3D 。 两位估计师都使用以前出版的标志性提取器作为输入, 并定制一个附加附加注释的数据集, 而手势则直接由先前公布的方法来估计。 我们的RGB- D 机构模型超越了基于 RGB- D 的状态和 RGB- D- D 通用 系统将使用一个较慢的精确度 。