Temporal modeling is crucial for multi-frame human pose estimation. Most existing methods directly employ optical flow or deformable convolution to predict full-spectrum motion fields, which might incur numerous irrelevant cues, such as a nearby person or background. Without further efforts to excavate meaningful motion priors, their results are suboptimal, especially in complicated spatiotemporal interactions. On the other hand, the temporal difference has the ability to encode representative motion information which can potentially be valuable for pose estimation but has not been fully exploited. In this paper, we present a novel multi-frame human pose estimation framework, which employs temporal differences across frames to model dynamic contexts and engages mutual information objectively to facilitate useful motion information disentanglement. To be specific, we design a multi-stage Temporal Difference Encoder that performs incremental cascaded learning conditioned on multi-stage feature difference sequences to derive informative motion representation. We further propose a Representation Disentanglement module from the mutual information perspective, which can grasp discriminative task-relevant motion signals by explicitly defining useful and noisy constituents of the raw motion features and minimizing their mutual information. These place us to rank No.1 in the Crowd Pose Estimation in Complex Events Challenge on benchmark dataset HiEve, and achieve state-of-the-art performance on three benchmarks PoseTrack2017, PoseTrack2018, and PoseTrack21.
翻译:多数现有方法直接使用光流或变形变形模型来预测全频谱运动场,这可能产生许多无关紧要的提示,例如附近的人或背景。如果不进一步努力挖掘有意义的运动前期,其结果是不完美的,特别是在复杂的地貌相互作用中。另一方面,时间差异能够将代表运动的信息编码成一个代表运动的模块2020,可能对作出估计有价值,但尚未充分利用。在本文中,我们提出了一个新的多框架人构成估计框架,它利用跨框架的时间差异来模拟动态背景,并客观地利用相互信息来方便有用的运动信息脱节。具体地说,我们设计了一个多阶段运动运动前科,在多阶段特性差异序列上进行递增的累进学习条件,以获得内容丰富的运动代表。我们进一步提议从相互信息角度来建立代表分解2020模块,通过明确界定原始运动特征的有用和噪音成分并尽量减少其共同信息,在P17-T基准中,我们设计了一个多阶段变化数据库,在E-Crowrock-Rack P-Rack-Rack-Rest Stamp 上,在Cal-Chambreal Stal Stal Stat-Right Pribal-Right-Right-Right-Slock 上,我们在C-Rislock-Rislock-Slock-Slation Stal-T 3 Stal-Slaxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)。</s>