Modelling long-range contextual relationships is critical for pixel-wise prediction tasks such as semantic segmentation. However, convolutional neural networks (CNNs) are inherently limited to model such dependencies due to the naive structure in its building modules (\eg, local convolution kernel). While recent global aggregation methods are beneficial for long-range structure information modelling, they would oversmooth and bring noise to the regions containing fine details (\eg,~boundaries and small objects), which are very much cared for the semantic segmentation task. To alleviate this problem, we propose to explore the local context for making the aggregated long-range relationship being distributed more accurately in local regions. In particular, we design a novel local distribution module which models the affinity map between global and local relationship for each pixel adaptively. Integrating existing global aggregation modules, we show that our approach can be modularized as an end-to-end trainable block and easily plugged into existing semantic segmentation networks, giving rise to the \emph{GALD} networks. Despite its simplicity and versatility, our approach allows us to build new state of the art on major semantic segmentation benchmarks including Cityscapes, ADE20K, Pascal Context, Camvid and COCO-stuff. Code and trained models are released at \url{https://github.com/lxtGH/GALD-DGCNet} to foster further research.
翻译:模拟长距离背景关系对于像素预测任务,例如语义分割等至关重要。 但是,由于建筑模块(\ eg, 本地 convolution 内核) 的天性结构, convolution 神经网络(CNNs) 本质上仅限于模拟这种依赖性。 虽然最近的全球汇总方法有利于长距离结构信息建模,但它们会超浮,并给含有细细细节的区域带来噪音(\ eg, ~ 边界和小物体),这些细节对于语义分割任务非常受关注。为了缓解这一问题,我们提议探索本地环境,使综合长程关系更准确地在地方区域分布。 特别是,我们设计了一个全新的本地配置模块,用来模拟每个像素的全球性和地方关系之间的亲近性图。 整合现有的全球汇总模块,我们表明我们的方法可以模块化为端对端可训练的块块块,并很容易地连接到现有的语义分割网络, 从而进一步提升到 & emph{ CNLD} 网络。尽管其简单性和多面关系分布, 我们的方法允许我们建立新的本地和卡路段基础, 。