Pose estimation of the human body and hands is a fundamental problem in computer vision, and learning-based solutions require a large amount of annotated data. In this work, we improve the efficiency of the data annotation process for 3D pose estimation problems with Active Learning (AL) in a multi-view setting. AL selects examples with the highest value to annotate under limited annotation budgets (time and cost), but choosing the selection strategy is often nontrivial. We present a framework to efficiently extend existing single-view AL strategies. We then propose two novel AL strategies that make full use of multi-view geometry. Moreover, we demonstrate additional performance gains by incorporating pseudo-labels computed during the AL process, which is a form of self-training. Our system significantly outperforms simulated annotation baselines in 3D body and hand pose estimation on two large-scale benchmarks: CMU Panoptic Studio and InterHand2.6M. Notably, on CMU Panoptic Studio, we are able to reduce the turn-around time by 60% and annotation cost by 80% when compared to the conventional annotation process.
翻译:人体和手的估测是计算机视觉的一个基本问题,而基于学习的解决方案需要大量附加说明的数据。在这项工作中,我们提高了3D数据批注过程的效率,这在多视图环境中对积极学习(AL)造成估算问题。AL选择了在有限的批注预算(时间和费用)下具有最高价值的批注实例,但选择选择战略往往不是三维的。我们提出了一个有效扩展现有单视AL战略的框架。然后我们提出了两个新的AL战略,充分利用多视图几何方法。此外,我们通过在AL进程期间计算假标签的方式展示了额外的绩效收益,这是一种自我培训的形式。我们的系统大大超越了3D身体的模拟批注基线,手对两个大型基准进行了估算:CMU Panphic Studio 和 InterHand2.6M。 值得注意的是,在CMU Panphis Studio上,我们能够将周转时间减少60%,与常规批注过程相比则减少80%的批注成本。