新智元推荐
来源:极市平台
作者:Happy
该文是浙江大学提出一种的新颖的通道注意力机制,它将通道注意力机制与DCT进行了巧妙的结合,并在常规的通道注意力机制上进行了扩展得到了本文所提出的多谱通道注意力机制:FcaLayer。作者在图像分类、目标检测以及实例分割等任务上验证了所提方案的有效性:在ImageNet分类任务上,相比SENet50,所提方法可以取得1.8%的性能提升。
从频域角度进行分析
方法介绍
Discrete Cosine Transform
DCT的定义如下:
其中, 表示2D-DCT频谱。对应的2D-IDCT的定义如下:
为简单起见,我们采用B表示2D-DCT的基函数:
实验
COCO
import mathimport torchimport torch.nn as nn
def get_ld_dct(i, freq, L): result = math.cos(math.pi * freq * (i + 0.5) / L) if freq == 0: return result else: return result * math.sqrt(2)
def get_dct_weights(width, height, channel, fidx_u, fidx_v): dct_weights = torch.zeros(1, channel, width, height)
# split channel for multi-spectral attention c_part = channel // len(fidx_u)
for i, (u_x, v_y) in enumerate(zip(fidx_u, fidx_v)): for t_x in range(width): for t_y in range(height): val = get_ld_dct(t_x, u_x, width) * get_ld_dct(t_y, v_y, height) dct_weights[:, i * c_part: (i+1) * c_part, t_x, t_y] = val
return dct_weights
class FcaLayer(nn.Module): def __init__(self, channels, reduction=16): super(FcaLayer, self).__init__() self.register_buffer("precomputed_dct_weights", get_dct_weights(...)) self.fc = nn.Sequential( nn.Linear(channels, channels//reduction, bias=False), nn.ReLU(inplace=True), nn.Linear(channels//reduction, channels, bias=False), nn.Sigmoid() )
def forward(self, x): n,c,_,_ = x.size() y = torch.sum(x * self.pre_computed_dct_weights, dim=[2,3]) y = self.fc(y).view(n,c,1,1) return x * y.expand_as(
论文链接:
https://arxiv.org/abs/2012.11879