Encouraged by the success of contrastive learning on image classification tasks, we propose a new self-supervised method for the structured regression task of 3D hand pose estimation. Contrastive learning makes use of unlabeled data for the purpose of representation learning via a loss formulation that encourages the learned feature representations to be invariant under any image transformation. For 3D hand pose estimation, it too is desirable to have invariance to appearance transformation such as color jitter. However, the task requires equivariance under affine transformations, such as rotation and translation. To address this issue, we propose an equivariant contrastive objective and demonstrate its effectiveness in the context of 3D hand pose estimation. We experimentally investigate the impact of invariant and equivariant contrastive objectives and show that learning equivariant features leads to better representations for the task of 3D hand pose estimation. Furthermore, we show that standard ResNets with sufficient depth, trained on additional unlabeled data, attain improvements of up to 14.5% in PA-EPE on FreiHAND and thus achieves state-of-the-art performance without any task specific, specialized architectures. Code and models are available at https://ait.ethz.ch/projects/2021/PeCLR/
翻译:在图像分类任务对比性学习的成功鼓舞下,我们为3D手结构回归任务的结构性回归任务提出了一种新的自监督方法。对比性学习利用未贴标签的数据,以通过损失公式进行代表学习,鼓励在任何图像变换中学习到的特征表现变化不定。对于3D手作出估计,也可取的做法是对外观变异,如色彩变异等。然而,任务要求在诸如轮作和翻译等近距离变换下实现差异性变异。为解决这一问题,我们提出了一个等异性对比目标,并在3D手作出估计的背景下展示其有效性。我们实验性地调查变异性和变异性对比目标的影响,并表明学习等异性特征有助于更好地表述3D手变色估计的任务。此外,我们展示了具有足够深度的标准ResNet,受过额外无标签数据的培训,在FreiHAND的PA-EPEP中实现了高达14.5%的改进,从而在3DHPA/PEMR中实现了状态-艺术业绩,而没有任何具体的任务模型。MAR/PERMIS/专门模型。