In this paper we investigate the amount of spatial context required for channel attention. To this end we study the popular squeeze-and-excite (SE) block which is a simple and lightweight channel attention mechanism. SE blocks and its numerous variants commonly use global average pooling (GAP) to create a single descriptor for each channel. Here, we empirically analyze the amount of spatial context needed for effective channel attention and find that limited localcontext on the order of seven rows or columns of the original image is sufficient to match the performance of global context. We propose tiled squeeze-and-excite (TSE), which is a framework for building SE-like blocks that employ several descriptors per channel, with each descriptor based on local context only. We further show that TSE is a drop-in replacement for the SE block and can be used in existing SE networks without re-training. This implies that local context descriptors are similar both to each other and to the global context descriptor. Finally, we show that TSE has important practical implications for deployment of SE-networks to dataflow AI accelerators due to their reduced pipeline buffering requirements. For example, using TSE reduces the amount of activation pipeline buffering in EfficientDetD2 by 90% compared to SE (from 50M to 4.77M) without loss of accuracy. Our code and pre-trained models will be publicly available.
翻译:在本文中, 我们调查频道关注所需的空间环境量。 为此, 我们研究广受欢迎的挤压和排泄( SE) 区块, 这是一个简单和轻量级的频道关注机制。 SE 区块及其众多变体通常使用全球平均集合( GAP) 来为每个频道创建单一描述器。 在这里, 我们用经验分析有效频道关注所需的空间环境量, 发现最初图像的7行或列的局部背景量足以与全球背景量相匹配。 我们提议, 平整的挤压和排泄( TSE) 区块是一个建设SE类区块的框架, 每个频道使用多个描述器, 每一个描述器仅以当地背景为基础。 我们进一步显示, TSE 是SE 区块的空置替代器, 并且可以在现有的 SE网络中不再培训, 本地环境量与全球背景描述器相似。 最后, 我们指出, TE 部域网组在部署数据流AIclestrain 和SE 90 缓冲模型方面有着重要的实际影响,, 将减少 SEM 的缓冲要求, 降低 缓冲规则, 的缓冲值, 将降低 至 缓冲到 缓冲到 缓冲到 缓冲规则, 。