Attention mechanism, especially channel attention, has gained great success in the computer vision field. Many works focus on how to design efficient channel attention mechanisms while ignoring a fundamental problem, i.e., channel attention mechanism uses scalar to represent channel, which is difficult due to massive information loss. In this work, we start from a different view and regard the channel representation problem as a compression process using frequency analysis. Based on the frequency analysis, we mathematically prove that the conventional global average pooling is a special case of the feature decomposition in the frequency domain. With the proof, we naturally generalize the compression of the channel attention mechanism in the frequency domain and propose our method with multi-spectral channel attention, termed as FcaNet. FcaNet is simple but effective. We can change a few lines of code in the calculation to implement our method within existing channel attention methods. Moreover, the proposed method achieves state-of-the-art results compared with other channel attention methods on image classification, object detection, and instance segmentation tasks. Our method could consistently outperform the baseline SENet, with the same number of parameters and the same computational cost. Our code and models will are publicly available at https://github.com/cfzd/FcaNet.
翻译:许多工作的重点是如何设计高效的频道关注机制,同时忽略一个根本问题,即频道关注机制使用卡路里代表频道,由于信息大量丢失,很难使用卡路里代表频道。在这项工作中,我们从不同的角度出发,将频道代表问题视为使用频率分析的压缩过程。根据频率分析,我们从数学上证明,传统的全球平均集合是频率域特征分解的特殊案例。通过证明,我们自然地将频道关注机制压缩到频率域,并以多光谱频道关注方式提出我们的方法,称为FcaNet。FcaNet是简单而有效的。我们可以改变计算方法中的几行代码,在现有频道关注方法中执行我们的方法。此外,拟议方法取得了最新的结果,而与其他关于图像分类、对象探测和实例分解任务等频道关注方法相比,我们的方法可以始终超越SENet基准,同时列出相同的参数和相同的计算成本。我们的代码和模型将公开提供。 httpscod/ httpsqubs/ httpsqub/com。