Multi-person pose estimation is an important but challenging problem in computer vision. Although current approaches have achieved significant progress by fusing the multi-scale feature maps, they pay little attention to enhancing the channel-wise and spatial information of the feature maps. In this paper, we propose two novel modules to perform the enhancement of the information for the multi-person pose estimation. First, a Channel Shuffle Module (CSM) is proposed to adopt the channel shuffle operation on the feature maps with different levels, promoting cross-channel information communication among the pyramid feature maps. Second, a Spatial, Channel-wise Attention Residual Bottleneck (SCARB) is designed to boost the original residual unit with attention mechanism, adaptively highlighting the information of the feature maps both in the spatial and channel-wise context. The effectiveness of our proposed modules is evaluated on the COCO keypoint benchmark, and experimental results show that our approach achieves the state-of-the-art results.
翻译:多重人构成估计是计算机愿景中一个重要但具有挑战性的问题。虽然目前的方法通过使用多尺度地貌图取得了显著进展,但很少注意加强地貌图的频道和空间信息。在本文件中,我们提出了两个新模块,用于加强多人构成图的信息。首先,建议在具有不同层次的地貌图上采用频道打拼模块(CSM),促进金字塔地貌图之间的跨通道信息沟通。第二,一个空间的、讲道的剩余物色网(SCARB)旨在用关注机制增强原有的留守单位,在空间和讲道背景下适应性地突出地貌图信息。我们提议的模块的有效性是根据COCO关键点基准进行评估的,实验结果显示,我们的方法达到了最新的结果。