The non-local network has become a widely used technique for semantic segmentation, which computes an attention map to measure the relationships of each pixel pair. However, most of the current popular non-local models tend to ignore the phenomenon that the calculated attention map appears to be very noisy, containing inter-class and intra-class inconsistencies, which lowers the accuracy and reliability of the non-local methods. In this paper, we figuratively denote these inconsistencies as attention noises and explore the solutions to denoise them. Specifically, we inventively propose a Denoised Non-Local Network (Denoised NL), which consists of two primary modules, i.e., the Global Rectifying (GR) block and the Local Retention (LR) block, to eliminate the inter-class and intra-class noises respectively. First, GR adopts the class-level predictions to capture a binary map to distinguish whether the selected two pixels belong to the same category. Second, LR captures the ignored local dependencies and further uses them to rectify the unwanted hollows in the attention map. The experimental results on two challenging semantic segmentation datasets demonstrate the superior performance of our model. Without any external training data, our proposed Denoised NL can achieve the state-of-the-art performance of 83.5\% and 46.69\% mIoU on Cityscapes and ADE20K, respectively.
翻译:非本地网络已成为一种广泛使用的语义分割技术,它计算出测量每个像素配对关系的注意地图。然而,目前大多数流行的非本地模型倾向于忽视这样一种现象,即计算出注意地图似乎非常吵,包含阶级间和阶级内不一致,这降低了非本地方法的准确性和可靠性。在本文中,我们用关注噪音来表示这些不一致之处,并探索消化它们的解决办法。具体地说,我们发明了一种分化的非本地网络(Denoized NL),它由两个主要模块组成,即全球校正(GR)块和本地保留(LR)块,以分别消除阶级间和阶级内噪音。首先,GR采用等级预测来捕捉一个二进制地图,以辨别所选的两种像素是否属于同一类别。第二,LLR捕捉了被忽略的本地依赖性,并进一步利用它们来纠正关注图中不必要的空心内空洞。实验性结果显示我们所提议的任何外部数据分级,可以实现我们所提议的任何外部数据分级。