The amount and quality of datasets and tools available in the research field of hand pose and shape estimation act as evidence to the significant progress that has been made. We find that there is still room for improvement in both fronts, and even beyond. Even the datasets of the highest quality, reported to date, have shortcomings in annotation. There are tools in the literature that can assist in that direction and yet they have not been considered, so far. To demonstrate how these gaps can be bridged, we employ such a publicly available, multi-camera dataset of hands (InterHand2.6M), and perform effective image-based refinement to improve on the imperfect ground truth annotations, yielding a better dataset. The image-based refinement is achieved through raytracing, a method that has not been employed so far to relevant problems and is hereby shown to be superior to the approximative alternatives that have been employed in the past. To tackle the lack of reliable ground truth, we resort to realistic synthetic data, to show that the improvement we induce is indeed significant, qualitatively, and quantitatively, too.
翻译:在手表和形状估计研究领域现有的数据集和工具的数量和质量是已经取得的重大进展的证据。我们发现,在这两个方面甚至更远的方面仍有改进的余地。即使是迄今所报告的质量最高的数据集,在说明方面也有缺点。文献中有一些工具可以帮助朝这个方向前进,但迄今尚未予以考虑。为了表明如何弥合这些差距,我们使用这种公开的、多镜头的手数据集(InterHand2.6M),并进行有效的图像改进,以改进不完善的地面真相说明,产生更好的数据集。基于图像的改进是通过对相关问题的追踪来实现的,这种方法远未被使用,因此表明优于过去使用的近似替代方法。为了解决缺乏可靠地面真相的问题,我们采用现实的合成数据,以表明我们带来的改进确实显著,质量和数量上也是显著的。