This work presents improvements in monocular hand shape estimation by building on top of recent advances in unsupervised learning. We extend momentum contrastive learning and contribute a structured collection of hand images, well suited for visual representation learning, which we call HanCo. We find that the representation learned by established contrastive learning methods can be improved significantly by exploiting advanced background removal techniques and multi-view information. These allow us to generate more diverse instance pairs than those obtained by augmentations commonly used in exemplar based approaches. Our method leads to a more suitable representation for the hand shape estimation task and shows a 4.7% reduction in mesh error and a 3.6% improvement in F-score compared to an ImageNet pretrained baseline. We make our benchmark dataset publicly available, to encourage further research into this direction.
翻译:这项工作在未受监督的学习的最新进展的基础上,改进了单手形状的估算。 我们扩展了对比学习的势头,并提供了结构化的手图象收集,这些图象非常适合视觉演示学习,我们称之为“韩高”。 我们发现,通过利用先进的背景清除技术和多视图信息,可以大大改进通过既定对比学习方法获得的代表性。这使我们能够产生比以实例为基础的方法通常使用的扩增获得的对等实例。我们的方法使得手形估测任务得到更合适的代表,并显示网状误差减少了4.7%,F类芯比图像网预先培训的基线改进了3.6%。我们公布我们的基准数据,鼓励进一步研究这个方向。