We introduce a novel representation learning method to disentangle pose-dependent as well as view-dependent factors from 2D human poses. The method trains a network using cross-view mutual information maximization (CV-MIM) which maximizes mutual information of the same pose performed from different viewpoints in a contrastive learning manner. We further propose two regularization terms to ensure disentanglement and smoothness of the learned representations. The resulting pose representations can be used for cross-view action recognition. To evaluate the power of the learned representations, in addition to the conventional fully-supervised action recognition settings, we introduce a novel task called single-shot cross-view action recognition. This task trains models with actions from only one single viewpoint while models are evaluated on poses captured from all possible viewpoints. We evaluate the learned representations on standard benchmarks for action recognition, and show that (i) CV-MIM performs competitively compared with the state-of-the-art models in the fully-supervised scenarios; (ii) CV-MIM outperforms other competing methods by a large margin in the single-shot cross-view setting; (iii) and the learned representations can significantly boost the performance when reducing the amount of supervised training data. Our code is made publicly available at https://github.com/google-research/google-research/tree/master/poem
翻译:我们采用了一种新的代表性学习方法,从2D人脸上解开依赖和仰视因素的2D人脸。该方法培训了一个使用交叉视图相互信息最大化(CV-MIM)的网络,这种网络以不同角度从不同角度以不同的学习方式最大限度地提供相同面貌的相互信息。我们进一步建议两个正规化术语,以确保所学代表体的分解和平滑。由此产生的代表性可用于交叉视角行动识别。为了评估所学代表体力,除了常规的完全监督的行动识别设置外,我们还引入了称为单一截图跨视图行动识别的新任务。这一任务只从一个观点出发来培训模型,同时从一个观点出发采取行动,同时从所有可能的观点的角度对模型进行评估。我们评估关于行动识别标准基准的学习性表述,并表明:(一) CV-MIM在完全监督的情景下,与最先进的模型相比,具有竞争力;(二) CV-MIM在单一截图交叉视图设置时,以较大幅度比其他竞争方法; (三) 当我们所学的在线/在线分析时,可以大大提升我们提供的业绩分析/数据库。