Convolutional networks are powerful visual models that yield hierarchies of features. We show that convolutional networks by themselves, trained end-to-end, pixels-to-pixels, exceed the state-of-the-art in semantic segmentation. Our key insight is to build "fully convolutional" networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning. We define and detail the space of fully convolutional networks, explain their application to spatially dense prediction tasks, and draw connections to prior models. We adapt contemporary classification networks (AlexNet, the VGG net, and GoogLeNet) into fully convolutional networks and transfer their learned representations by fine-tuning to the segmentation task. We then define a novel architecture that combines semantic information from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations. Our fully convolutional network achieves state-of-the-art segmentation of PASCAL VOC (20% relative improvement to 62.2% mean IU on 2012), NYUDv2, and SIFT Flow, while inference takes one third of a second for a typical image.
翻译:革命网络是强大的视觉模型,可以产生地貌分级。 我们显示,革命网络本身, 训练有素的端端到端、 像素到像素本身, 超越了语义分化中最先进的艺术。 我们的关键洞察力是建立“ 完全进化” 的网络, 以任意的大小进行输入, 并产生相应大小的输出, 高效的推断和学习。 我们定义和详细描述完全进化网络的空间, 解释其应用于空间密集的预测任务, 并连接到以前的模型。 我们将当代分类网络( AlexNet, VGGNet, 和 GoogLeNet) 改造成完全进化的进化网络, 并通过微调将其学到的表达方式转换到分化任务中。 我们随后定义了一个新结构, 将深层、 粗密层的语义信息与浅浅层的外观信息结合起来, 以产生准确和详细的分化。 我们的进化网络实现了PASCAL VOC 的状态- 艺术分化( 20% 相对改进到 32% IU on 2012) 典型的图象, 之一。