The 2D heatmap-based approaches have dominated Human Pose Estimation (HPE) for years due to high performance. However, the long-standing quantization error problem in the 2D heatmap-based methods leads to several well-known drawbacks: 1) The performance for the low-resolution inputs is limited; 2) To improve the feature map resolution for higher localization precision, multiple costly upsampling layers are required; 3) Extra post-processing is adopted to reduce the quantization error. To address these issues, we aim to explore a brand new scheme, called \textit{SimCC}, which reformulates HPE as two classification tasks for horizontal and vertical coordinates. The proposed SimCC uniformly divides each pixel into several bins, thus achieving \emph{sub-pixel} localization precision and low quantization error. Benefiting from that, SimCC can omit additional refinement post-processing and exclude upsampling layers under certain settings, resulting in a more simple and effective pipeline for HPE. Extensive experiments conducted over COCO, CrowdPose, and MPII datasets show that SimCC outperforms heatmap-based counterparts, especially in low-resolution settings by a large margin.
翻译:由于性能高,基于 2D 热映射法多年来一直以2D 热映射法(HPE)为主。然而,基于 2D 热映射法的长期量化错误问题导致若干众所周知的缺点:(1) 低分辨率输入的性能有限;(2) 改进地貌图分辨率以达到更高的本地化精度,需要多成本高的加压层;(3) 采用额外后处理来减少量化错误。为了解决这些问题,我们力求探索一个名为\ textit{SimCC}的品牌新方案,将HPE重新配置为水平和垂直坐标的两个分类任务。拟议的SimCC统一将每个像素分成几个文件夹,从而实现\emph{sub-pixel}本地化精度和低度校正错误。 由此,SimCC可以省略额外的精细处理后处理,排除某些环境下的上层,从而导致对高分辨率、 CCO- Crow-Pose 进行更简单和有效的管道实验。 在低分辨率设置中, MPII 数据设置显示一个大型的图像, 特别是分辨率对等。