Estimating the articulated 3D hand-object pose from a single RGB image is a highly ambiguous and challenging problem requiring large-scale datasets that contain diverse hand poses, object poses, and camera viewpoints. Most real-world datasets lack this diversity. In contrast, synthetic datasets can easily ensure vast diversity, but learning from them is inefficient and suffers from heavy training consumption. To address the above issues, we propose ArtiBoost, a lightweight online data enrichment method that boosts articulated hand-object pose estimation from the data perspective. ArtiBoost is employed along with a real-world source dataset. During training, ArtiBoost alternatively performs data exploration and synthesis. ArtiBoost can cover various hand-object poses and camera viewpoints based on a Compositional hand-object Configuration and Viewpoint space (CCV-space) and can adaptively enrich the current hard-discernable samples by a mining strategy. We apply ArtiBoost on a simple learning baseline network and demonstrate the performance boost on several hand-object benchmarks. As an illustrative example, with ArtiBoost, even a simple baseline network can outperform the previous start-of-the-art based on Transformer on the HO3D dataset. Our code is available at https://github.com/MVIG-SJTU/ArtiBoost.
翻译:从一个 RGB 图像中估算3D 手球的分辨面貌是一个非常模糊和具有挑战性的问题,需要大型数据集,其中包括不同的手姿、物体姿势和相机视角。大多数真实世界数据集缺乏这种多样性。相比之下,合成数据集可以很容易地确保巨大的多样性,但从这些数据集中学习效率低下,并受到大量的培训消耗。为了解决上述问题,我们提议采用一个轻巧的在线数据浓缩方法ArtiBoost,即一种能推动分辨手球的分辨的在线数据浓缩方法,从数据角度进行估算。ArtiBoost与一个真实世界源数据集一起使用。在培训期间,ArtiBoost可以进行数据勘探和合成。ArtiBoost可以覆盖各种手球姿势和相机视角,而基于构成手势配置配置和查看点空间(CCV-空间)的合成数据集,并能够适应性地通过采矿战略丰富目前难解的样本。我们在一个简单的学习基准网络上应用ArtiBoost,并展示几个手辨基准基准基准的性推进力。作为示例,ArtiBoost-Bost Stroformast 数据在以前的网络上有一个简单的数据库- stormagroft-st-st-st-tost-st-traft-tragsmap smstr the costr suptal suptal smal sat