For the current 3D human pose estimation task, a group of methods mainly learn the rules of 2D-3D projection from spatial and temporal correlation. However, earlier methods model the global features of the entire body joint in the time domain, but ignore the motion trajectory of individual joint. The recent work [29] considers that there are differences in motion between different joints and deals with the temporal relationship of each joint separately. However, we found that different joints show the same movement trends under some specific actions. Therefore, our proposed Fusionformer method introduces a self-trajectory module and a mutual-trajectory module based on the spatio-temporal module .After that, the global spatio-temporal features and local joint trajectory features are fused through a linear network in a parallel manner. To eliminate the influence of bad 2D poses on 3D projections, finally we also introduce a pose refinement network to balance the consistency of 3D projections. In addition, we evaluate the proposed method on two benchmark datasets (Human3.6M, MPI-INF-3DHP). Comparing our method with the baseline method poseformer, the results show an improvement of 2.4% MPJPE and 4.3% P-MPJPE on the Human3.6M dataset, respectively.
翻译:对于目前的 3D 人造外观估计任务,一组方法主要从时空相关关系中学习 2D-3D 投影规则。然而,早期的方法在时间域中模拟整个机体的全体特征,但忽略了单个机体的运动轨迹。最近的工作[29]认为,不同机体之间的运动有差异,并分别处理每个机体的时际关系。然而,我们发现,不同的联结显示在某些具体行动下相同的运动趋势趋势。因此,我们提议的组合式方法引入了以阵列时空模块为基础的自射线模块和相互射影模块。随后,全球弹管时空特征和地方联合轨特征以平行的方式通过线性网络连接起来。为了消除坏2D对3D预测的影响,我们最后还引入了一个改进网络,以平衡3D预测的一致性。此外,我们评估了两个基准数据集的拟议方法(HR3.6M、MPI-INF-3DHP)。我们的方法与基准方法的基数方法比较了PM3.6%的数据。