

Semantic Segmentation - 语义分割

1. 【Semantic Segmentation】Deep High-Resolution Representation Learning for Visual Recognition


作者:Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, Wenyu Liu, Bin Xiao






High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection. Existing state-of-the-art frameworks first encode the input image as a low-resolution representation through a subnetwork that is formed by connecting high-to-low resolution convolutions \emph{in series} (e.g., ResNet, VGGNet), and then recover the high-resolution representation from the encoded low-resolution representation. Instead, our proposed network, named as High-Resolution Network (HRNet), maintains high-resolution representations through the whole process. There are two key characteristics: (i) Connect the high-to-low resolution convolution streams \emph{in parallel}; (ii) Repeatedly exchange the information across resolutions. The benefit is that the resulting representation is semantically richer and spatially more precise. We show the superiority of the proposed HRNet in a wide range of applications, including human pose estimation, semantic segmentation, and object detection, suggesting that the HRNet is a stronger backbone for computer vision problems.



2. 【Semantic Segmentation】ResNeSt: Split-Attention Networks


作者:Hang Zhang, Chongruo Wu, Zhongyue Zhang, Yi Zhu, Haibin Lin, Zhi Zhang, Yue Sun, Tong He, Jonas Mueller, R. Manmatha, Mu Li, Alexander Smola






It is well known that featuremap attention and multi-path representation are important for visual recognition. In this paper, we present a modularized architecture, which applies the channel-wise attention on different network branches to leverage their success in capturing cross-feature interactions and learning diverse representations. Our design results in a simple and unified computation block, which can be parameterized using only a few variables. Our model, named ResNeSt, outperforms EfficientNet in accuracy and latency trade-off on image classification. In addition, ResNeSt has achieved superior transfer learning results on several public benchmarks serving as the backbone, and has been adopted by the winning entries of COCO-LVIS challenge. The source code for complete system and pretrained models are publicly available.



3. 【Semantic Segmentation】Microsoft COCO: Common Objects in Context

【语义分割】Microsoft COCO:上下文中的公共对象

作者:Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, Piotr Dollár






We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding. This is achieved by gathering images of complex everyday scenes containing common objects in their natural context. Objects are labeled using per-instance segmentations to aid in precise object localization. Our dataset contains photos of 91 objects types that would be easily recognizable by a 4 year old. With a total of 2.5 million labeled instances in 328k images, the creation of our dataset drew upon extensive crowd worker involvement via novel user interfaces for category detection, instance spotting and instance segmentation. We present a detailed statistical analysis of the dataset in comparison to PASCAL, ImageNet, and SUN. Finally, we provide baseline performance analysis for bounding box and segmentation detection results using a Deformable Parts Model.



4. 【Semantic Segmentation】Attention U-Net: Learning Where to Look for the Pancreas

【语义分割】注意 U-Net:学习在哪里寻找胰腺

作者:Ozan Oktay, Jo Schlemper, Loic Le Folgoc, Matthew Lee, Mattias Heinrich, Kazunari Misawa, Kensaku Mori, Steven McDonagh, Nils Y Hammerla, Bernhard Kainz, Ben Glocker, Daniel Rueckert






We propose a novel attention gate (AG) model for medical imaging that automatically learns to focus on target structures of varying shapes and sizes. Models trained with AGs implicitly learn to suppress irrelevant regions in an input image while highlighting salient features useful for a specific task. This enables us to eliminate the necessity of using explicit external tissue/organ localisation modules of cascaded convolutional neural networks (CNNs). AGs can be easily integrated into standard CNN architectures such as the U-Net model with minimal computational overhead while increasing the model sensitivity and prediction accuracy. The proposed Attention U-Net architecture is evaluated on two large CT abdominal datasets for multi-class image segmentation. Experimental results show that AGs consistently improve the prediction performance of U-Net across different datasets and training sizes while preserving computational efficiency. The code for the proposed architecture is publicly available.


我们提出了一种用于医学成像的新型注意力门(AG)模型,该模型自动学习关注不同形状和大小的目标结构。使用AG训练的模型会隐式学习抑制输入图像中的不相关区域,同时突出对特定任务有用的显着特征。这使我们能够消除使用级联卷积神经网络(CNN)的显式外部组织/器官定位模块的必要性。AG可以很容易地集成到标准的CNN架构中,例如U-Net模型,计算开销最小,同时提高了模型的灵敏度和预测精度。所提出的Attention U-Net架构在两个大型CT腹部数据集上进行了评估,用于多类图像分割。实验结果表明,AGs在保持计算效率的同时,不断提高U-Net在不同数据集和训练规模上的预测性能。所提议架构的代码是公开的。

5. 【Semantic Segmentation】Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding

【语义分割】贝叶斯 SegNet:用于场景理解的深度卷积编码器-解码器架构中的模型不确定性

作者:Alex Kendall, Vijay Badrinarayanan, Roberto Cipolla






We present a deep learning framework for probabilistic pixel-wise semantic segmentation, which we term Bayesian SegNet. Semantic segmentation is an important tool for visual scene understanding and a meaningful measure of uncertainty is essential for decision making. Our contribution is a practical system which is able to predict pixel-wise class labels with a measure of model uncertainty. We achieve this by Monte Carlo sampling with dropout at test time to generate a posterior distribution of pixel class labels. In addition, we show that modelling uncertainty improves segmentation performance by 2-3% across a number of state of the art architectures such as SegNet, FCN and Dilation Network, with no additional parametrisation. We also observe a significant improvement in performance for smaller datasets where modelling uncertainty is more effective. We benchmark Bayesian SegNet on the indoor SUN Scene Understanding and outdoor CamVid driving scenes datasets.


我们提出了一个用于概率像素语义分割的深度学习框架,我们称之为贝叶斯SegNet。语义分割是视觉场景理解的重要工具,有意义的不确定性度量对于决策至关重要。我们的贡献是一个实用的系统,它能够通过模型不确定性的度量来预测像素级的类标签。我们通过在测试时使用带有dropout的蒙特卡罗采样来生成像素类标签的后验分布来实现这一点。此外,我们还表明,建模不确定性可以在许多最先进的架构(如SegNet、FCN和Dilation Network)中提高2-3%的分割性能,而无需额外的参数化。我们还观察到对不确定性建模更有效的较小数据集的性能显着提高。我们在室内SUN场景理解和室外CamVid驾驶场景数据集上对贝叶斯SegNet进行了基准测试。


