A high-resolution network exhibits remarkable capability in extracting multi-scale features for human pose estimation, but fails to capture long-range interactions between joints and has high computational complexity. To address these problems, we present a Dynamic lightweight High-Resolution Network (Dite-HRNet), which can efficiently extract multi-scale contextual information and model long-range spatial dependency for human pose estimation. Specifically, we propose two methods, dynamic split convolution and adaptive context modeling, and embed them into two novel lightweight blocks, which are named dynamic multi-scale context block and dynamic global context block. These two blocks, as the basic component units of our Dite-HRNet, are specially designed for the high-resolution networks to make full use of the parallel multi-resolution architecture. Experimental results show that the proposed network achieves superior performance on both COCO and MPII human pose estimation datasets, surpassing the state-of-the-art lightweight networks. Code is available at: \url{https://github.com/ZiyiZhang27/Dite-HRNet}.
翻译:高分辨率网络在提取多尺度的人类构成估计特征方面表现出非凡的能力,但未能捕捉到联合之间的长距离相互作用,而且具有很高的计算复杂性。为了解决这些问题,我们提出了一个动态轻量高分辨率网络(Dite-HRNet),它能够有效地提取多尺度的背景资料和模型的人类构成估计的远距离空间依赖性。具体地说,我们提出了两种方法,即动态的分化和适应性背景模型,并将其嵌入两个新型的轻量级块,称为动态的多尺度环境块和动态的全球背景块。这两个块,作为我们Dite-HRNet的基本组成部分,是专门为高分辨率网络设计的,以便充分利用平行的多分辨率结构。实验结果显示,拟议的网络在COCO和MPII人类构成估计数据集方面都取得了优异的性能,超过了最先进的轻量级网络。代码可在以下网址查阅:https://github.com/ZiyiZhan27/Dite-HRNet。