More and more evidence has shown that strengthening layer interactions can enhance the representation power of a deep neural network, while self-attention excels at learning interdependencies by retrieving query-activated information. Motivated by this, we devise a cross-layer attention mechanism, called multi-head recurrent layer attention (MRLA), that sends a query representation of the current layer to all previous layers to retrieve query-related information from different levels of receptive fields. A light-weighted version of MRLA is also proposed to reduce the quadratic computation cost. The proposed layer attention mechanism can enrich the representation power of many state-of-the-art vision networks, including CNNs and vision transformers. Its effectiveness has been extensively evaluated in image classification, object detection and instance segmentation tasks, where improvements can be consistently observed. For example, our MRLA can improve 1.6% Top-1 accuracy on ResNet-50, while only introducing 0.16M parameters and 0.07B FLOPs. Surprisingly, it can boost the performances by a large margin of 3-4% box AP and mask AP in dense prediction tasks. Our code is available at https://github.com/joyfang1106/MRLA.
翻译:越来越多的证据表明,加强层层互动可以增强深神经网络的代表性力量,而自我关注则通过检索查询激活的信息,在学习相互依存关系方面优于自我关注。为此,我们设计了一个跨层关注机制,称为多头重复层关注(MRLA),将当前层的查询代表传给所有前层,以便从不同层次的可接收域检索与查询有关的信息。还提议了一个轻量化的MRLA版本,以降低二次计算成本。拟议的层关注机制可以丰富许多最先进的视觉网络,包括CNNs和视觉变异器,其有效性在图像分类、对象探测和实例分割任务方面得到了广泛评价,可以不断观察到改进。例如,我们的MRLA可以提高1.6%的当前层,在ResNet-50上只引入0.16M参数和0.07B FLOPs。令人惊讶的是,它可以通过3-4%的框 AP/MAP-110MR任务中大大的功率提升AP/MR。我们的代码可以在 https://https.