For real-time semantic segmentation, how to increase the speed while maintaining high resolution is a problem that has been discussed and solved. Backbone design and fusion design have always been two essential parts of real-time semantic segmentation. We hope to design a light-weight network based on previous design experience and reach the level of state-of-the-art real-time semantic segmentation without any pre-training. To achieve this goal, a encoder-decoder architectures are proposed to solve this problem by applying a decoder network onto a backbone model designed for real-time segmentation tasks and designed three different ways to fuse semantics and detailed information in the aggregation phase. We have conducted extensive experiments on two semantic segmentation benchmarks. Experiments on the Cityscapes and CamVid datasets show that the proposed FRFNet strikes a balance between speed calculation and accuracy. It achieves 76.4\% Mean Intersection over Union (mIoU\%) on the Cityscapes test dataset with the speed of 161 FPS on a single RTX 2080Ti card. The Code is available at https://github.com/favoMJ/FRFNet.
翻译:对于实时语义分解,如何在保持高分辨率的同时提高速度是已经讨论和解决的一个问题。后骨设计和聚合设计一直是实时语义分解的两个必要部分。我们希望根据先前的设计经验设计一个轻量网络,并达到最先进的实时语义分解水平,而无需经过任何培训。为实现这一目标,提议了一个编码解码结构来解决这一问题,办法是将解码网络应用到为实时分解任务设计的主干模型上,并设计了三种不同的方式来结合语义和汇总阶段的详细信息。我们在两个语义分解基准上进行了广泛的实验。城市景和卡姆维德数据集实验显示,拟议的FRFNet在速度计算和准确性之间达到了平衡。在市景测试数据集上,在一台RTX 2080-Ti卡上以161 FPS 速度测试数据集。该代码可在 https://gius/avolub/MfNet上查阅。