We propose to use a model-based generative loss for training hand pose estimators on depth images based on a volumetric hand model. This additional loss allows training of a hand pose estimator that accurately infers the entire set of 21 hand keypoints while only using supervision for 6 easy-to-annotate keypoints (fingertips and wrist). We show that our partially-supervised method achieves results that are comparable to those of fully-supervised methods which enforce articulation consistency. Moreover, for the first time we demonstrate that such an approach can be used to train on datasets that have erroneous annotations, i.e. "ground truth" with notable measurement errors, while obtaining predictions that explain the depth images better than the given "ground truth".
翻译:我们建议使用基于模型的基因损失来训练手,根据体积手模型对深度图象进行测算。这种额外损失使得能够训练手势测算器,精确地推算出21个全组手键点,而只对6个容易辨别的关键点(指尖和手腕)进行监督。我们显示,我们部分监督的方法取得的结果与完全监督的方法的结果相类似,这些方法可以加强表达一致性。此外,我们第一次证明,这种方法可以用来训练那些有错误注释的数据集,即具有显著测量错误的“地面真相”,同时获得比给定的“地面真相”更好的解释深度图象的预测。