Semantic segmentation requires per-pixel prediction for a given image. Typically, the output resolution of a segmentation network is severely reduced due to the downsampling operations in the CNN backbone. Most previous methods employ upsampling decoders to recover the spatial resolution. Various decoders were designed in the literature. Here, we propose a novel decoder, termed dynamic neural representational decoder (NRD), which is simple yet significantly more efficient. As each location on the encoder's output corresponds to a local patch of the semantic labels, in this work, we represent these local patches of labels with compact neural networks. This neural representation enables our decoder to leverage the smoothness prior in the semantic label space, and thus makes our decoder more efficient. Furthermore, these neural representations are dynamically generated and conditioned on the outputs of the encoder networks. The desired semantic labels can be efficiently decoded from the neural representations, resulting in high-resolution semantic segmentation predictions. We empirically show that our proposed decoder can outperform the decoder in DeeplabV3+ with only 30% computational complexity, and achieve competitive performance with the methods using dilated encoders with only 15% computation. Experiments on the Cityscapes, ADE20K, and PASCAL Context datasets demonstrate the effectiveness and efficiency of our proposed method.
翻译:语义分解器需要对给定图像进行单像素预测。 典型地, 分解网络的输出分辨率会因CNN主干网的下游取样操作而严重降低。 大部分先前的方法都使用上层取样解码器来恢复空间分辨率。 文献中设计了各种解码器。 这里, 我们提议了一个小的解码器, 叫做动态神经代表解码器( NRD), 简单但效率要高得多。 由于编码器输出的每个位置都与语义标签的局部部分相对应, 在这项工作中, 我们代表着这些带有紧凑神经网络网络的标签的本地部分。 这个神经显示使我们的解码器能够利用语义标签空间之前的平滑度, 从而使我们的解码器更有效率。 此外, 这些神经表达器是动态生成的, 并以编码网络的输出为条件。 理想的语义20 标签可以有效地从神经结构图解解码解码, 从而产生高分辨率解析分解的分区预测。 我们用实验性地显示, 我们的解解解解解码3 的逻辑计算方法只能显示我们提议的市内程的解解变的计算方法,,,, 只有变的计算方法, 只有变的解变的计算, 只有的计算,, 变的计算方法只能制的计算, 变变变变变变的计算方法只能用的计算方法, 。