Both performance and efficiency are important to semantic segmentation. State-of-the-art semantic segmentation algorithms are mostly based on dilated Fully Convolutional Networks (dilatedFCN), which adopt dilated convolutions in the backbone networks to extract high-resolution feature maps for achieving high-performance segmentation performance. However, due to many convolution operations are conducted on the high-resolution feature maps, such dilatedFCN-based methods result in large computational complexity and memory consumption. To balance the performance and efficiency, there also exist encoder-decoder structures that gradually recover the spatial information by combining multi-level feature maps from the encoder. However, the performances of existing encoder-decoder methods are far from comparable with the dilatedFCN-based methods. In this paper, we propose the EfficientFCN, whose backbone is a common ImageNet pre-trained network without any dilated convolution. A holistically-guided decoder is introduced to obtain the high-resolution semantic-rich feature maps via the multi-scale features from the encoder. The decoding task is converted to novel codebook generation and codeword assembly task, which takes advantages of the high-level and low-level features from the encoder. Such a framework achieves comparable or even better performance than state-of-the-art methods with only 1/3 of the computational cost. Extensive experiments on PASCAL Context, PASCAL VOC, ADE20K validate the effectiveness of the proposed EfficientFCN.
翻译:性能和效率对于语义分解很重要。 国产语义分解算法主要基于扩展的全革命网络(Lilated FCN),这些算法采用主干网络的放大变异,以提取高分辨率分解性能的高分辨率地貌图。然而,由于许多变异作业是在高分辨率地貌图上进行的,这种变异式FCN方法导致大量计算复杂性和内存消耗。为了平衡性能和效率,还存在编码器分解器结构,这些结构通过从编码器中合并多级别地段地段地段地段地段地段地段地段地段地段图逐渐恢复空间信息。 然而,现有的编码器分解码方法的性能远远不及高分辨率地段地段地段地段地段地段地段地段地段图。 在本文中,我们建议高效的FCN(其主线线线是通用的经过事先训练的网络,但不会发生任何偏差的变异性。 引入一个整体制导解解解解解解算器, 以便通过甚至通过多级级级地段地段地段地段地段地段地段地段地段地段地段地段地段地段地段地段地段地段地段地段地段地段地段地段地段地段地段图图图图图图图图图图图图图图图图图,从一个比,使成本段段段段段段段线段段段段段段段段段段段段段段段段段段段段段段段段段段段段段段段段段线段段段段段段段段段段段段段段段段段段段段段段段段图,而进行,使。