We present an efficient high-resolution network, Lite-HRNet, for human pose estimation. We start by simply applying the efficient shuffle block in ShuffleNet to HRNet (high-resolution network), yielding stronger performance over popular lightweight networks, such as MobileNet, ShuffleNet, and Small HRNet. We find that the heavily-used pointwise (1x1) convolutions in shuffle blocks become the computational bottleneck. We introduce a lightweight unit, conditional channel weighting, to replace costly pointwise (1x1) convolutions in shuffle blocks. The complexity of channel weighting is linear w.r.t the number of channels and lower than the quadratic time complexity for pointwise convolutions. Our solution learns the weights from all the channels and over multiple resolutions that are readily available in the parallel branches in HRNet. It uses the weights as the bridge to exchange information across channels and resolutions, compensating the role played by the pointwise (1x1) convolution. Lite-HRNet demonstrates superior results on human pose estimation over popular lightweight networks. Moreover, Lite-HRNet can be easily applied to semantic segmentation task in the same lightweight manner. The code and models have been publicly available at https://github.com/HRNet/Lite-HRNet.
翻译:我们提出一个高效的高分辨率网络,即Lite-HRNet,以进行人造估计。我们首先简单地将ShuffleNet中的高效打字块应用到 HRNet (高分辨率网络), 使广受欢迎的轻量网络(如移动网络)、 ShuffleNet 和小HRNet ) 产生更强的性能。 我们发现, 高使用点( 1x1) 的打字器( Shifle- HRNet ) 成为计算瓶颈。 我们引入了一个轻量的单元, 有条件的频道加权, 以取代高成本的打字(1x1) 。 频道加权的复杂性是线性 w.r.t. 频道的数量, 并且比点性网络的四重时间复杂程度要低。 我们的解决方案学习了所有渠道的权重, 和在人力资源网的平行分支中很容易获得的多项分辨率。 我们使用这些权重作为交换跨频道和分辨率的桥梁, 补偿点( 1x1) 共振动的作用。 Lite-HR Net 显示在人造型网络上对人造型轻度网络作出更优的姿态估计的结果。 此外, lite-HR- commt- commt commmmmmmmmmlational dal dal dalddddal laxal