Many machine vision applications require predictions for every pixel of the input image (for example semantic segmentation, boundary detection). Models for such problems usually consist of encoders which decreases spatial resolution while learning a high-dimensional representation, followed by decoders who recover the original input resolution and result in low-dimensional predictions. While encoders have been studied rigorously, relatively few studies address the decoder side. Therefore this paper presents an extensive comparison of a variety of decoders for a variety of pixel-wise prediction tasks. Our contributions are: (1) Decoders matter: we observe significant variance in results between different types of decoders on various problems. (2) We introduce a novel decoder: bilinear additive upsampling. (3) We introduce new residual-like connections for decoders. (4) We identify two decoder types which give a consistently high performance.
翻译:许多机器视觉应用要求对输入图像的每一个像素进行预测(例如,语义分割、边界探测)。这些问题的模型通常由编码器组成,这些编码器在学习高维代表的同时降低空间分辨率,然后由解码器恢复原始输入分辨率并进行低维预测。虽然对编码器进行了严格研究,但处理解码器方面的研究报告相对较少。因此,本文件对各种像素预测任务的各种解码器进行了广泛的比较。我们的贡献是:(1) 解码器问题:(1) 我们观察到不同类型解码器在各种问题上的结果存在显著差异。(2) 我们引入了一种新型解码器:双线添加剂高采样。(3) 我们为解码器引进了新的类似残余的连接。(4) 我们发现两种解码器类型,其性能始终很高。