In human and hand pose estimation, heatmaps are a crucial intermediate representation for a body or hand keypoint. Two popular methods to decode the heatmap into a final joint coordinate are via an argmax, as done in heatmap detection, or via softmax and expectation, as done in integral regression. Integral regression is learnable end-to-end, but has lower accuracy than detection. This paper uncovers an induced bias from integral regression that results from combining the softmax and the expectation operation. This bias often forces the network to learn degenerately localized heatmaps, obscuring the keypoint's true underlying distribution and leads to lower accuracies. Training-wise, by investigating the gradients of integral regression, we show that the implicit guidance of integral regression to update the heatmap makes it slower to converge than detection. To counter the above two limitations, we propose Bias Compensated Integral Regression (BCIR), an integral regression-based framework that compensates for the bias. BCIR also incorporates a Gaussian prior loss to speed up training and improve prediction accuracy. Experimental results on both the human body and hand benchmarks show that BCIR is faster to train and more accurate than the original integral regression, making it competitive with state-of-the-art detection methods.
翻译:在人和手的表面估计中,热映射是人体或手键点的关键中间代表。两种将热映射解码成最终联合坐标的流行方法,如热映射探测,或软映射和期望,如整体回归。综合回归是可学习的端到端,但精确度低于检测。本文揭示了综合回归的诱导偏差,这种偏差是软映射和预期操作相结合的结果。这种偏差往往迫使网络学习本地化的热映射,掩盖关键点的真正基本分布并导致更低的偏差。从培训角度看,通过调查整体回归的梯度,我们通过调查整体回归的梯度,显示更新热映射的整体回归的隐含指导比检测慢。为了克服上述两个限制,我们建议Bias综合综合综合回归(BCIR),一个综合回归框架,以弥补偏差。BCIR还包含一个高斯前损失,以加快培训并改进预测准确度。从实验性结果看,在人类整体回归学上,实验性结果是更快速的,而先导测得更精确的。