Real-time semantic segmentation, which can be visually understood as the pixel-level classification task on the input image, currently has broad application prospects, especially in the fast-developing fields of autonomous driving and drone navigation. However, the huge burden of calculation together with redundant parameters are still the obstacles to its technological development. In this paper, we propose a Fast Bilateral Symmetrical Network (FBSNet) to alleviate the above challenges. Specifically, FBSNet employs a symmetrical encoder-decoder structure with two branches, semantic information branch, and spatial detail branch. The semantic information branch is the main branch with deep network architecture to acquire the contextual information of the input image and meanwhile acquire sufficient receptive field. While spatial detail branch is a shallow and simple network used to establish local dependencies of each pixel for preserving details, which is essential for restoring the original resolution during the decoding phase. Meanwhile, a feature aggregation module (FAM) is designed to effectively combine the output features of the two branches. The experimental results of Cityscapes and CamVid show that the proposed FBSNet can strike a good balance between accuracy and efficiency. Specifically, it obtains 70.9\% and 68.9\% mIoU along with the inference speed of 90 fps and 120 fps on these two test datasets, respectively, with only 0.62 million parameters on a single RTX 2080Ti GPU.
翻译:实时语义分解可被直观地理解为输入图像的像素级分类任务,目前具有广泛的应用前景,特别是在自动驾驶和无人驾驶导航的快速开发领域。然而,计算和冗余参数的巨大负担仍然是其技术发展的障碍。在本文件中,我们提议建立一个快速双边对称网络(FBSNet)以缓解上述挑战。具体地说,FBSNet使用一个对称的编码解码结构,有两个分支,即语义信息分支和空间细节分支。语义信息分支是拥有深度网络结构的主要分支,以获取输入图像的背景信息,同时获得足够的可接受字段。虽然空间细节分支是一个浅浅而简单的网络,用来建立每个像素的本地依赖性来保存细节,这对于在解析阶段恢复最初的解决方案至关重要。与此同时,一个地貌汇总模块(FAM)仅旨在有效地将两个分支的输出特征结合起来。城市景象和CamVid信息分支的实验结果显示,拟议的FBS-9Net可以分别与70-Fx速度和120-x的精确度测试数据分别与70-120 ms。