Despite the significant progress that depth-based 3D hand pose estimation methods have made in recent years, they still require a large amount of labeled training data to achieve high accuracy. However, collecting such data is both costly and time-consuming. To tackle this issue, we propose a semi-supervised method to significantly reduce the dependence on labeled training data. The proposed method consists of two identical networks trained jointly: a teacher network and a student network. The teacher network is trained using both the available labeled and unlabeled samples. It leverages the unlabeled samples via a loss formulation that encourages estimation equivariance under a set of affine transformations. The student network is trained using the unlabeled samples with their pseudo-labels provided by the teacher network. For inference at test time, only the student network is used. Extensive experiments demonstrate that the proposed method outperforms the state-of-the-art semi-supervised methods by large margins.
翻译:尽管近年来基于深度的三维手部姿态估计方法取得了显着进展,但仍需要大量标记训练数据才能达到高准确度。然而,收集这样的数据既耗资又耗时。为了解决这个问题,我们提出了一种半监督方法来显著降低对标记训练数据的依赖性。该提议的方法包括两个相同的网络,即教师网络和学生网络。教师网络使用可用的标记和未标记样本进行训练。它通过一种损失公式利用未标记的样本来鼓励在一组仿射变换下估计等变性。学生网络使用由教师网络提供的伪标签来使用未标记的样本进行训练。在测试时,只使用学生网络。广泛的实验表明,所提出的方法比最先进的半监督方法有更好的表现。