Most existing human pose estimation (HPE) methods exploit multi-scale information by fusing feature maps of four different spatial sizes, \ie $1/4$, $1/8$, $1/16$, and $1/32$ of the input image. There are two drawbacks of this strategy: 1) feature maps of different spatial sizes may be not well aligned spatially, which potentially hurts the accuracy of keypoint location; 2) these scales are fixed and inflexible, which may restrict the generalization ability over various human sizes. Towards these issues, we propose an adaptive dilated convolution (ADC). It can generate and fuse multi-scale features of the same spatial sizes by setting different dilation rates for different channels. More importantly, these dilation rates are generated by a regression module. It enables ADC to adaptively adjust the fused scales and thus ADC may generalize better to various human sizes. ADC can be end-to-end trained and easily plugged into existing methods. Extensive experiments show that ADC can bring consistent improvements to various HPE methods. The source codes will be released for further research.
翻译:多数现有人类构成估计(HPE)方法利用多种规模的信息,方法是用四个不同空间大小的地貌图、1/4美元、1/8美元、1/16美元和输入图像的1/328美元来引信多尺度的图象。本战略有两个缺点:(1) 不同空间大小的地貌图在空间上可能不完全一致,从而可能损害关键点位置的准确性;(2) 这些尺度是固定和不灵活的,可能限制不同人类大小的通用能力。为了解决这些问题,我们建议采用适应性扩展变异(ADC),它可以为不同渠道设定不同的通缩率来生成和融合相同空间大小的多尺度特征。更重要的是,这些变异率是由回归模块产生的。它使得ADC能够适应性调整合并的尺度,因此ADC可以更好地将各种人类大小概括化。ADC可以接受端到端的培训,并很容易地插入到现有方法中。广泛的实验表明,ADC可以使各种HPE方法得到一致的改进。源代码将发布供进一步研究。