Semantic segmentation is a critical technology for autonomous vehicles to understand surrounding scenes. For practical autonomous vehicles, it is undesirable to spend a considerable amount of inference time to achieve high-accuracy segmentation results. Using light-weight architectures (encoder-decoder or two-pathway) or reasoning on low-resolution images, recent methods realize very fast scene parsing which even run at more than 100 FPS on single 1080Ti GPU. However, there are still evident gaps in performance between these real-time methods and models based on dilation backbones. To tackle this problem, we propose novel deep dual-resolution networks (DDRNets) for real-time semantic segmentation of road scenes. Besides, we design a new contextual information extractor named Deep Aggregation Pyramid Pooling Module (DAPPM) to enlarge effective receptive fields and fuse multi-scale context. Our method achieves new state-of-the-art trade-off between accuracy and speed on both Cityscapes and CamVid dataset. Specially, on single 2080Ti GPU, DDRNet-23-slim yields 77.4% mIoU at 109 FPS on Cityscapes test set and 74.4% mIoU at 230 FPS on CamVid test set. Without utilizing attention mechanism, pre-training on larger semantic segmentation dataset or inference acceleration, DDRNet-39 attains 80.4% test mIoU at 23 FPS on Cityscapes. With widely used test augmentation, our method is still superior to most state-of-the-art models, requiring much less computation. Codes and trained models will be made publicly available.
翻译:语义分解是自主车辆理解周围场景的关键技术。 对于实用自主车辆来说,不适宜花大量时间进行大量推断,以取得高准确度分解结果。使用轻量结构(编码解码器或双路路)或低分辨率图像推理,最近的方法可以实现非常快的场面分解,甚至在单1080Ti GPU上运行100个以上的FPS。然而,这些实时方法与基于放大主干网的模型之间在性能上仍然存在着明显的差距。为了解决这一问题,我们建议使用新的深度双解路网络(DDMNets)实现高准确度分解。此外,我们设计了一个新的背景信息提取器,名为Deep Agrication Pyramid 集合模块(DAPPM),以扩大有效的收缩场,并结合多级背景环境。我们的方法在市景和CamVid State数据集的准确度和速度之间仍然有新的交易规则。 特别是,在2080TPU上,DFNet-23-S-S-s高级测试模型,在FPS-S-S-S-real测试模型上将使用最常用的77.4MI 测试模型。