In this work, we establish dense correspondences between RGB image and a surface-based representation of the human body, a task we refer to as dense human pose estimation. We first gather dense correspondences for 50K persons appearing in the COCO dataset by introducing an efficient annotation pipeline. We then use our dataset to train CNN-based systems that deliver dense correspondence 'in the wild', namely in the presence of background, occlusions and scale variations. We improve our training set's effectiveness by training an 'inpainting' network that can fill in missing groundtruth values and report clear improvements with respect to the best results that would be achievable in the past. We experiment with fully-convolutional networks and region-based models and observe a superiority of the latter; we further improve accuracy through cascading, obtaining a system that delivers highly0accurate results in real time. Supplementary materials and videos are provided on the project page http://densepose.org
翻译:在这项工作中,我们在RGB图像和人体表面代表之间建立起密集的对应关系,我们称之为密集的人类构成估计。我们首先通过引入高效的注解管道,为COCO数据集中出现的50K人收集密集的对应关系。然后,我们利用我们的数据集培训有线电视新闻网的系统,这些系统提供“野外”的密集对应关系,即有背景、排斥和规模变化。我们通过培训“油漆”网络,填补缺失的地平线值,并报告过去可以实现的最佳结果方面的明显改进。我们试验全面革命网络和基于区域的模式,观察后者的优越性;我们通过累进式系统进一步提高准确性,获得实时提供高度精确结果的系统。我们在项目网页http://densepose.org上提供了补充材料和视频。