A high-resolution network exhibits remarkable capability in extracting multi-scale features for human pose estimation, but fails to capture long-range interactions between joints and has high computational complexity. To address these problems, we present a Dynamic lightweight High-Resolution Network (Dite-HRNet), which can efficiently extract multi-scale contextual information and model long-range spatial dependency for human pose estimation. Specifically, we propose two methods, dynamic split convolution and adaptive context modeling, and embed them into two novel lightweight blocks, which are named dynamic multi-scale context block and dynamic global context block. These two blocks, as the basic component units of our Dite-HRNet, are specially designed for the high-resolution networks to make full use of the parallel multi-resolution architecture. Experimental results show that the proposed network achieves superior performance on both COCO and MPII human pose estimation datasets, surpassing the state-of-the-art lightweight networks. Code is available at: https://github.com/ZiyiZhang27/Dite-HRNet.
翻译:高分辨率网络在提取多尺度的人类构成估计特征方面表现出非凡的能力,但未能捕捉到联合之间的长距离相互作用,并且具有很高的计算复杂性。为了解决这些问题,我们提出了一个动态轻量高分辨率网络(Dite-HRNet),它能够有效地提取多尺度的背景资料和模型的人类构成估计的远距离空间依赖性。具体地说,我们提出了两种方法,即动态的混合和适应性环境建模,并将其嵌入称为动态多尺度环境块和动态全球环境块的两个新型轻重量区块。这两个区块,作为我们Dite-HRNet的基本组成部分,是专门为高分辨率网络设计的,以便充分利用平行的多分辨率结构。实验结果显示,拟议的网络在COCO和MPII人类构成估计数据集方面都取得了优异性业绩,超过了最先进的轻量网络。代码可在https://github.com/ZiyiZhang27/Dite-HRNet上查阅。