Semantic segmentation is one of the key tasks in computer vision, which is to assign a category label to each pixel in an image. Despite significant progress achieved recently, most existing methods still suffer from two challenging issues: 1) the size of objects and stuff in an image can be very diverse, demanding for incorporating multi-scale features into the fully convolutional networks (FCNs); 2) the pixels close to or at the boundaries of object/stuff are hard to classify due to the intrinsic weakness of convolutional networks. To address the first issue, we propose a new Multi-Receptive Field Module (MRFM), explicitly taking multi-scale features into account. For the second issue, we design an edge-aware loss which is effective in distinguishing the boundaries of object/stuff. With these two designs, our Multi Receptive Field Network achieves new state-of-the-art results on two widely-used semantic segmentation benchmark datasets. Specifically, we achieve a mean IoU of 83.0 on the Cityscapes dataset and 88.4 mean IoU on the Pascal VOC2012 dataset.
翻译:语义分解是计算机视觉中的关键任务之一,即为图像中的每个像素指定一个标签。尽管最近取得了显著进展,但大多数现有方法仍面临两个具有挑战性的问题:(1) 图像中的物体和物品的大小可能非常多样化,要求将多尺度特征纳入完全革命网络;(2) 接近物体/物体边界或位于物体/物体边界的像素由于革命网络的内在弱点而难以分类。为了解决第一个问题,我们提议一个新的多感应场模块(MRFM),明确考虑到多尺度特征。关于第二个问题,我们设计了一个边缘觉悟损失,有效地区分对象/物体的界限。有了这两个设计,我们的多感应场网络在两个广泛使用的语义分解基准数据集上取得了新的最新结果。具体地说,我们在市景数据集上实现了83.0的平均IoU,而在Pascal VOC2012数据集上实现了88.4平均IoU。