Semantic image segmentation is one of the most challenged tasks in computer vision. In this paper, we propose a highly fused convolutional network, which consists of three parts: feature downsampling, combined feature upsampling and multiple predictions. We adopt a strategy of multiple steps of upsampling and combined feature maps in pooling layers with its corresponding unpooling layers. Then we bring out multiple pre-outputs, each pre-output is generated from an unpooling layer by one-step upsampling. Finally, we concatenate these pre-outputs to get the final output. As a result, our proposed network makes highly use of the feature information by fusing and reusing feature maps. In addition, when training our model, we add multiple soft cost functions on pre-outputs and final outputs. In this way, we can reduce the loss reduction when the loss is back propagated. We evaluate our model on three major segmentation datasets: CamVid, PASCAL VOC and ADE20K. We achieve a state-of-the-art performance on CamVid dataset, as well as considerable improvements on PASCAL VOC dataset and ADE20K dataset
翻译:语义图像分割是计算机视觉中最具挑战性的任务之一。 在本文中, 我们提出一个高度结合的进化网络, 由三部分组成: 特征下标、 组合特征抽样和多重预测; 我们采取多步抽样和组合特征图的战略, 将层层及其相应的非集合层混合在一起, 采用多步抽样和组合特征图; 然后我们提出多个预断, 每项预产都是通过一步骤的抽查生成的。 最后, 我们将这些预产分解归为最终输出。 因此, 我们提议的网络通过使用和重新使用特征图, 高度利用特征信息。 此外, 在培训模型时, 我们增加多个关于预产值和最终输出层的软成本功能。 这样, 当损失再次传播时, 我们就可以减少损失的减少。 我们评估了三个主要分解数据集的模型: CamVid、 PCAL VOC 和 ADE20K。 我们实现了CAVASAD的数据的状态, 以及数据的显著改进。