道路现场实时和准确的闭路分解深度双分辨率网络 (Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes)

from arxiv, 12 pages, 7 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Semantic segmentation is a key technology for autonomous vehicles to understand the surrounding scenes. The appealing performances of contemporary models usually come at the expense of heavy computations and lengthy inference time, which is intolerable for self-driving. Using light-weight architectures (encoder-decoder or two-pathway) or reasoning on low-resolution images, recent methods realize very fast scene parsing, even running at more than 100 FPS on a single 1080Ti GPU. However, there is still a significant gap in performance between these real-time methods and the models based on dilation backbones. To tackle this problem, we proposed a family of efficient backbones specially designed for real-time semantic segmentation. The proposed deep dual-resolution networks (DDRNets) are composed of two deep branches between which multiple bilateral fusions are performed. Additionally, we design a new contextual information extractor named Deep Aggregation Pyramid Pooling Module (DAPPM) to enlarge effective receptive fields and fuse multi-scale context based on low-resolution feature maps. Our method achieves a new state-of-the-art trade-off between accuracy and speed on both Cityscapes and CamVid dataset. In particular, on a single 2080Ti GPU, DDRNet-23-slim yields 77.4% mIoU at 102 FPS on Cityscapes test set and 74.7% mIoU at 230 FPS on CamVid test set. With widely used test augmentation, our method is superior to most state-of-the-art models and requires much less computation. Codes and trained models are available online.

翻译：语义分解是自主飞行器理解周围场景的关键技术。当代模型的吸引力性能通常以沉重的计算和漫长的推导时间为代价, 这对于自我驱动是不可容忍的。使用轻量结构( coder- decoder 或双路路) 或低分辨率图像推理, 最近的方法可以很快地进行场面分解, 甚至在一个单一的 1080Ti GPU上运行超过100 FPS 。然而, 这些实时方法与基于比对脊柱的模型之间的性能差距仍然很大。为了解决这个问题, 我们提出了一套专门设计用于实时语义分解的高效骨干。拟议的深双分辨率网络( CDMNets) 由两个深度分支组成, 进行多种双边混集。此外, 我们设计了一个新的背景信息提取器名为Deep Agregation Pyrammid 集合模块( DAPPM) 以扩大有效的可接收字段, 并基于低分辨率地标图的多尺度背景环境。为了解决这个问题, 我们的方法是一个新的州- 州- 20 VAL 测试模型, 和市一级的CLeal- breabal 标准, 需要一个新的州- breal- breal- sal- seral 20 Creal creal creal creal creal creal creal creal cal cal 标准和20 sal- sest sal- sal- sal- sal- sal- sal- sal- sal- sald sald sald sald sald sald sald sald sald sald sald 和20 sald sald sald sald sald sald sald sald sald sald sald sald sald sald sald sald sald saldaldaldaldaldald sald sald 和 sald 和20 sald sal 和20 sald dald sald sald sald sald saldaldaldald dald daldaldaldaldald