Semantic image segmentation is one of the most challenged tasks in computer vision. In this paper, we propose a highly fused convolutional network, which consists of three parts: feature downsampling, combined feature upsampling and multiple predictions. We adopt a strategy of multiple steps of upsampling and combined feature maps in pooling layers with its corresponding unpooling layers. Then we bring out multiple pre-outputs, each pre-output is generated from an unpooling layer by one-step upsampling. Finally, we concatenate these pre-outputs to get the final output. As a result, our proposed network makes highly use of the feature information by fusing and reusing feature maps. In addition, when training our model, we add multiple soft cost functions on pre-outputs and final outputs. In this way, we can reduce the loss reduction when the loss is back propagated. We evaluate our model on three major segmentation datasets: CamVid, PASCAL VOC and ADE20K. We achieve a state-of-the-art performance on CamVid dataset, as well as considerable improvements on PASCAL VOC dataset and ADE20K dataset

3
下载
关闭预览

相关内容

在数学优化,统计学,计量经济学,决策理论,机器学习和计算神经科学中,代价函数,又叫损失函数或成本函数,它是将一个或多个变量的事件阈值映射到直观地表示与该事件。 一个优化问题试图最小化损失函数。 目标函数是损失函数或其负值,在这种情况下它将被最大化。

Recently, numerous handcrafted and searched networks have been applied for semantic segmentation. However, previous works intend to handle inputs with various scales in pre-defined static architectures, such as FCN, U-Net, and DeepLab series. This paper studies a conceptually new method to alleviate the scale variance in semantic representation, named dynamic routing. The proposed framework generates data-dependent routes, adapting to the scale distribution of each image. To this end, a differentiable gating function, called soft conditional gate, is proposed to select scale transform paths on the fly. In addition, the computational cost can be further reduced in an end-to-end manner by giving budget constraints to the gating function. We further relax the network level routing space to support multi-path propagations and skip-connections in each forward, bringing substantial network capacity. To demonstrate the superiority of the dynamic property, we compare with several static architectures, which can be modeled as special cases in the routing space. Extensive experiments are conducted on Cityscapes and PASCAL VOC 2012 to illustrate the effectiveness of the dynamic framework. Code is available at https://github.com/yanwei-li/DynamicRouting.

0
6
下载
预览

Radiologist is "doctor's doctor", biomedical image segmentation plays a central role in quantitative analysis, clinical diagnosis, and medical intervention. In the light of the fully convolutional networks (FCN) and U-Net, deep convolutional networks (DNNs) have made significant contributions in biomedical image segmentation applications. In this paper, based on U-Net, we propose MDUnet, a multi-scale densely connected U-net for biomedical image segmentation. we propose three different multi-scale dense connections for U shaped architectures encoder, decoder and across them. The highlights of our architecture is directly fuses the neighboring different scale feature maps from both higher layers and lower layers to strengthen feature propagation in current layer. Which can largely improves the information flow encoder, decoder and across them. Multi-scale dense connections, which means containing shorter connections between layers close to the input and output, also makes much deeper U-net possible. We adopt the optimal model based on the experiment and propose a novel Multi-scale Dense U-Net (MDU-Net) architecture with quantization. Which reduce overfitting in MDU-Net for better accuracy. We evaluate our purpose model on the MICCAI 2015 Gland Segmentation dataset (GlaS). The three multi-scale dense connections improve U-net performance by up to 1.8% on test A and 3.5% on test B in the MICCAI Gland dataset. Meanwhile the MDU-net with quantization achieves the superiority over U-Net performance by up to 3% on test A and 4.1% on test B.

1
8
下载
预览

We present an end-to-end method for the task of panoptic segmentation. The method makes instance segmentation and semantic segmentation predictions in a single network, and combines these outputs using heuristics to create a single panoptic segmentation output. The architecture consists of a ResNet-50 feature extractor shared by the semantic segmentation and instance segmentation branch. For instance segmentation, a Mask R-CNN type of architecture is used, while the semantic segmentation branch is augmented with a Pyramid Pooling Module. Results for this method are submitted to the COCO and Mapillary Joint Recognition Challenge 2018. Our approach achieves a PQ score of 17.6 on the Mapillary Vistas validation set and 27.2 on the COCO test-dev set.

0
4
下载
预览

For the challenging semantic image segmentation task the most efficient models have traditionally combined the structured modelling capabilities of Conditional Random Fields (CRFs) with the feature extraction power of CNNs. In more recent works however, CRF post-processing has fallen out of favour. We argue that this is mainly due to the slow training and inference speeds of CRFs, as well as the difficulty of learning the internal CRF parameters. To overcome both issues we propose to add the assumption of conditional independence to the framework of fully-connected CRFs. This allows us to reformulate the inference in terms of convolutions, which can be implemented highly efficiently on GPUs. Doing so speeds up inference and training by a factor of more then 100. All parameters of the convolutional CRFs can easily be optimized using backpropagation. To facilitating further CRF research we make our implementation publicly available. Please visit: https://github.com/MarvinTeichmann/ConvCRF

0
7
下载
预览

In this paper, we propose an efficient architecture for semantic image segmentation using the depth-to-space (D2S) operation. Our D2S model is comprised of a standard CNN encoder followed by a depth-to-space reordering of the final convolutional feature maps; thus eliminating the decoder portion of traditional encoder-decoder segmentation models and reducing computation time almost by half. As a participant of the DeepGlobe Road Extraction competition, we evaluate our models on the corresponding road segmentation dataset. Our highly efficient D2S models exhibit comparable performance to standard segmentation models with much less computational cost.

0
7
下载
预览

We propose a novel locally adaptive learning estimator for enhancing the inter- and intra- discriminative capabilities of Deep Neural Networks, which can be used as improved loss layer for semantic image segmentation tasks. Most loss layers compute pixel-wise cost between feature maps and ground truths, ignoring spatial layouts and interactions between neighboring pixels with same object category, and thus networks cannot be effectively sensitive to intra-class connections. Stride by stride, our method firstly conducts adaptive pooling filter operating over predicted feature maps, aiming to merge predicted distributions over a small group of neighboring pixels with same category, and then it computes cost between the merged distribution vector and their category label. Such design can make groups of neighboring predictions from same category involved into estimations on predicting correctness with respect to their category, and hence train networks to be more sensitive to regional connections between adjacent pixels based on their categories. In the experiments on Pascal VOC 2012 segmentation datasets, the consistently improved results show that our proposed approach achieves better segmentation masks against previous counterparts.

0
3
下载
预览

The way that information propagates in neural networks is of great importance. In this paper, we propose Path Aggregation Network (PANet) aiming at boosting information flow in proposal-based instance segmentation framework. Specifically, we enhance the entire feature hierarchy with accurate localization signals in lower layers by bottom-up path augmentation, which shortens the information path between lower layers and topmost feature. We present adaptive feature pooling, which links feature grid and all feature levels to make useful information in each feature level propagate directly to following proposal subnetworks. A complementary branch capturing different views for each proposal is created to further improve mask prediction. These improvements are simple to implement, with subtle extra computational overhead. Our PANet reaches the 1st place in the COCO 2017 Challenge Instance Segmentation task and the 2nd place in Object Detection task without large-batch training. It is also state-of-the-art on MVD and Cityscapes.

0
3
下载
预览

Spatial pyramid pooling module or encode-decoder structure are used in deep neural networks for semantic segmentation task. The former networks are able to encode multi-scale contextual information by probing the incoming features with filters or pooling operations at multiple rates and multiple effective fields-of-view, while the latter networks can capture sharper object boundaries by gradually recovering the spatial information. In this work, we propose to combine the advantages from both methods. Specifically, our proposed model, DeepLabv3+, extends DeepLabv3 by adding a simple yet effective decoder module to refine the segmentation results especially along object boundaries. We further explore the Xception model and apply the depthwise separable convolution to both Atrous Spatial Pyramid Pooling and decoder modules, resulting in a faster and stronger encoder-decoder network. We demonstrate the effectiveness of the proposed model on the PASCAL VOC 2012 semantic image segmentation dataset and achieve a performance of 89% on the test set without any post-processing. Our paper is accompanied with a publicly available reference implementation of the proposed models in Tensorflow.

0
7
下载
预览

Image segmentation is considered to be one of the critical tasks in hyperspectral remote sensing image processing. Recently, convolutional neural network (CNN) has established itself as a powerful model in segmentation and classification by demonstrating excellent performances. The use of a graphical model such as a conditional random field (CRF) contributes further in capturing contextual information and thus improving the segmentation performance. In this paper, we propose a method to segment hyperspectral images by considering both spectral and spatial information via a combined framework consisting of CNN and CRF. We use multiple spectral cubes to learn deep features using CNN, and then formulate deep CRF with CNN-based unary and pairwise potential functions to effectively extract the semantic correlations between patches consisting of three-dimensional data cubes. Effective piecewise training is applied in order to avoid the computationally expensive iterative CRF inference. Furthermore, we introduce a deep deconvolution network that improves the segmentation masks. We also introduce a new dataset and experimented our proposed method on it along with several widely adopted benchmark datasets to evaluate the effectiveness of our method. By comparing our results with those from several state-of-the-art models, we show the promising potential of our method.

0
10
下载
预览

Convolutional networks are powerful visual models that yield hierarchies of features. We show that convolutional networks by themselves, trained end-to-end, pixels-to-pixels, exceed the state-of-the-art in semantic segmentation. Our key insight is to build "fully convolutional" networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning. We define and detail the space of fully convolutional networks, explain their application to spatially dense prediction tasks, and draw connections to prior models. We adapt contemporary classification networks (AlexNet, the VGG net, and GoogLeNet) into fully convolutional networks and transfer their learned representations by fine-tuning to the segmentation task. We then define a novel architecture that combines semantic information from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations. Our fully convolutional network achieves state-of-the-art segmentation of PASCAL VOC (20% relative improvement to 62.2% mean IU on 2012), NYUDv2, and SIFT Flow, while inference takes one third of a second for a typical image.

0
3
下载
预览
小贴士
相关论文
Learning Dynamic Routing for Semantic Segmentation
Yanwei Li,Lin Song,Yukang Chen,Zeming Li,Xiangyu Zhang,Xingang Wang,Jian Sun
6+阅读 · 2020年3月23日
MDU-Net: Multi-scale Densely Connected U-Net for biomedical image segmentation
Jiawei Zhang,Yuzhen Jin,Jilan Xu,Xiaowei Xu,Yanchun Zhang
8+阅读 · 2018年12月4日
Panoptic Segmentation with a Joint Semantic and Instance Segmentation Network
Daan de Geus,Panagiotis Meletis,Gijs Dubbelman
4+阅读 · 2018年9月6日
Marvin T. T. Teichmann,Roberto Cipolla
7+阅读 · 2018年5月15日
Shubhra Aich,William van der Kamp,Ian Stavness
7+阅读 · 2018年5月1日
Jinjiang Guo,Pengyuan Ren,Aiguo Gu,Jian Xu,Weixin Wu
3+阅读 · 2018年4月16日
Shu Liu,Lu Qi,Haifang Qin,Jianping Shi,Jiaya Jia
3+阅读 · 2018年3月5日
Liang-Chieh Chen,Yukun Zhu,George Papandreou,Florian Schroff,Hartwig Adam
7+阅读 · 2018年2月7日
Fahim Irfan Alam,Jun Zhou,Alan Wee-Chung Liew,Xiuping Jia,Jocelyn Chanussot,Yongsheng Gao
10+阅读 · 2017年12月27日
Jonathan Long,Evan Shelhamer,Trevor Darrell
3+阅读 · 2015年3月8日
相关资讯
PyTorch语义分割开源库semseg
极市平台
18+阅读 · 2019年6月6日
Transferring Knowledge across Learning Processes
CreateAMind
8+阅读 · 2019年5月18日
DL | 语义分割综述
机器学习算法与Python学习
54+阅读 · 2019年3月13日
TorchSeg:基于pytorch的语义分割算法开源了
极市平台
19+阅读 · 2019年1月28日
Unsupervised Learning via Meta-Learning
CreateAMind
32+阅读 · 2019年1月3日
《pyramid Attention Network for Semantic Segmentation》
统计学习与视觉计算组
40+阅读 · 2018年8月30日
【推荐】全卷积语义分割综述
机器学习研究会
17+阅读 · 2017年8月31日
Top