While hand pose estimation is a critical component of most interactive extended reality and gesture recognition systems, contemporary approaches are not optimized for computational and memory efficiency. In this paper, we propose a tiny deep neural network of which partial layers are recursively exploited for refining its previous estimations. During its iterative refinements, we employ learned gating criteria to decide whether to exit from the weight-sharing loop, allowing per-sample adaptation in our model. Our network is trained to be aware of the uncertainty in its current predictions to efficiently gate at each iteration, estimating variances after each loop for its keypoint estimates. Additionally, we investigate the effectiveness of end-to-end and progressive training protocols for our recursive structure on maximizing the model capacity. With the proposed setting, our method consistently outperforms state-of-the-art 2D/3D hand pose estimation approaches in terms of both accuracy and efficiency for widely used benchmarks.
翻译:虽然手形估计是大多数互动的扩大现实和姿态识别系统的一个关键组成部分,但现代方法并没有优化计算和记忆效率。在本文件中,我们提议建立一个微小的深神经网络,其中部分层被反复利用,以完善其先前的估计。在迭代完善过程中,我们采用学习标准来决定是否退出权重共享循环,允许在模型中进行逐个抽样调整。我们的网络经过培训,了解其当前预测的不确定性,以便在每个循环中有效地锁定,估计每个循环中的关键点估计后的差异。此外,我们调查我们循环结构关于最大限度地扩大模型能力的端对端和渐进培训协议的有效性。在拟议的设置中,我们的方法始终超越了2D/3D手的状态,在广泛使用的基准的准确性和效率方面提出了估算方法。