Fully supervised human mesh recovery methods are data-hungry and have poor generalizability due to the limited availability and diversity of 3D-annotated benchmark datasets. Recent progress in self-supervised human mesh recovery has been made using synthetic-data-driven training paradigms where the model is trained from synthetic paired 2D representation (e.g., 2D keypoints and segmentation masks) and 3D mesh. However, on synthetic dense correspondence maps (i.e., IUV) few have been explored since the domain gap between synthetic training data and real testing data is hard to address for 2D dense representation. To alleviate this domain gap on IUV, we propose cross-representation alignment utilizing the complementary information from the robust but sparse representation (2D keypoints). Specifically, the alignment errors between initial mesh estimation and both 2D representations are forwarded into regressor and dynamically corrected in the following mesh regression. This adaptive cross-representation alignment explicitly learns from the deviations and captures complementary information: robustness from sparse representation and richness from dense representation. We conduct extensive experiments on multiple standard benchmark datasets and demonstrate competitive results, helping take a step towards reducing the annotation effort needed to produce state-of-the-art models in human mesh estimation.
翻译:由于3D加注基准数据集的可用性和多样性有限,完全监督的人类网格恢复方法缺乏数据,而且缺乏一般性。最近,在自我监督的人类网格恢复方面取得了进展,使用了合成数据驱动的培训模式,该模式从合成配对2D代表制(例如2D关键点和分解面罩)和3D网格中得到了培训。然而,在合成密集的通信地图(即IUV)中,由于合成培训数据和实际测试数据之间的领域差距难以解决,因此很少有人被探索。为了缩小IUV的这一领域差距,我们建议利用强健但稀少的代表制(2D关键点)提供的补充信息进行交叉代表制调整。具体地说,初始网格估计和2D代表制之间的校正错误被转成后退缩,并动态地纠正。这种适应性跨代表制调整明确从偏差中吸取了补充信息:从缺乏代表制的代表制和密集代表制的丰富程度中汲取了强健的信息。我们建议利用强健但缺乏的代表制代表制的代表制模式进行广泛的实验,从而产生竞争性的结果。我们没有在多个标准模型中进行竞争性。