Pose estimation of the human body/hand is a fundamental problem in computer vision, and learning-based solutions require a large amount of annotated data. Given limited annotation budgets, a common approach to increasing label efficiency is Active Learning (AL), which selects examples with the highest value to annotate, but choosing the selection strategy is often nontrivial. In this work, we improve Active Learning for the problem of 3D pose estimation in a multi-view setting, which is of increasing importance in many application scenarios. We develop a framework that allows us to efficiently extend existing single-view AL strategies, and then propose two novel AL strategies that make full use of multi-view geometry. Moreover, we demonstrate additional performance gains by incorporating predicted pseudo-labels, which is a form of self-training. Our system significantly outperforms baselines in 3D body and hand pose estimation on two large-scale benchmarks: CMU Panoptic Studio and InterHand2.6M. Notably, on CMU Panoptic Studio, we are able to match the performance of a fully-supervised model using only 20% of labeled training data.
翻译:对人体/手的切片估计是计算机视觉的一个基本问题,而基于学习的解决方案需要大量附加说明的数据。鉴于有限的注解预算,提高标签效率的通用方法是积极学习(AL),它选择了对笔记价值最高的例子,但选择选择战略往往是非三重性的。在这项工作中,我们改进了对3D问题的主动学习,在多视环境中进行估算,这在许多应用情景中越来越重要。我们开发了一个框架,使我们能够有效地扩展现有的单视AL战略,然后提出两个新的AL战略,充分利用多视几何方法。此外,我们通过纳入预测的假标签(这是一种自我培训形式)来展示更多的绩效收益。我们的系统大大超过3D身体的基线,手对两个大规模基准(CMU Panopictus Studio和InterHand2.6M)进行估算。 值得注意的是,关于CMU Panphictistristela,我们只能匹配完全监督模型的性能,只使用20%的标签培训数据。