We introduce TempCLR, a new time-coherent contrastive learning approach for the structured regression task of 3D hand reconstruction. Unlike previous time-contrastive methods for hand pose estimation, our framework considers temporal consistency in its augmentation scheme, and accounts for the differences of hand poses along the temporal direction. Our data-driven method leverages unlabelled videos and a standard CNN, without relying on synthetic data, pseudo-labels, or specialized architectures. Our approach improves the performance of fully-supervised hand reconstruction methods by 15.9% and 7.6% in PA-V2V on the HO-3D and FreiHAND datasets respectively, thus establishing new state-of-the-art performance. Finally, we demonstrate that our approach produces smoother hand reconstructions through time, and is more robust to heavy occlusions compared to the previous state-of-the-art which we show quantitatively and qualitatively. Our code and models will be available at https://eth-ait.github.io/tempclr.
翻译:我们引入了TempCLR(TempCLR),这是3D手重建结构回归任务中一个新的时间一致的对比学习方法。 与以往的时间调和方法不同,我们的框架考虑的是其扩增计划的时间一致性,并说明了随时间方向的手势差异。 我们的数据驱动方法在不依赖合成数据、假标签或专门结构的情况下,利用无标签的视频和标准CNN(CNN)来影响无标签的视频和标准CNN。 我们的方法将HO-3D和FreiHAND数据集中完全监督的手势重建方法的绩效分别提高15.9%和7.6%,从而建立了新的先进性能状态。 最后,我们证明我们的方法在时间上产生更平稳的手势重建,并且比我们以前展示的定量和定性的艺术状态更加强大。我们的代码和模型将在https://eth-ait.github.io/tempcler上查阅。