Semantic segmentation requires both rich spatial information and sizeable receptive field. However, modern approaches usually compromise spatial resolution to achieve real-time inference speed, which leads to poor performance. In this paper, we address this dilemma with a novel Bilateral Segmentation Network (BiSeNet). We first design a Spatial Path with a small stride to preserve the spatial information and generate high-resolution features. Meanwhile, a Context Path with a fast downsampling strategy is employed to obtain sufficient receptive field. On top of the two paths, we introduce a new Feature Fusion Module to combine features efficiently. The proposed architecture makes a right balance between the speed and segmentation performance on Cityscapes, CamVid, and COCO-Stuff datasets. Specifically, for a 2048x1024 input, we achieve 68.4% Mean IOU on the Cityscapes test dataset with speed of 105 FPS on one NVIDIA Titan XP card, which is significantly faster than the existing methods with comparable performance.
翻译:语义分割需要丰富的空间信息和相当可观的可接受字段。 但是, 现代方法通常会影响空间分辨率, 以实现实时推断速度, 从而导致性能不佳。 在本文中, 我们用一个新的双边分割网( BiseNet) 来解决这一难题。 我们首先设计一个带有小步的空间路径, 以保存空间信息并生成高分辨率特征。 同时, 使用一个快速下游取样策略的“ 环境路径” 来获取足够的可接受字段。 在两条路径上, 我们引入一个新的功能组合模块, 以高效地组合功能。 拟议的架构在城市景、 CamVid 和 CO- 配置数据集上的速度和分割性能之间保持正确的平衡。 具体地说, 在2048x1024 输入时, 我们在城市景区测试数据集上实现了68.4%的中值 IOU, 其速度为105 FPS, 其速度大大快于具有类似性能的现有方法。