Real-time semantic segmentation is playing a more important role in computer vision, due to the growing demand for mobile devices and autonomous driving. Therefore, it is very important to achieve a good trade-off among performance, model size and inference speed. In this paper, we propose a Channel-wise Feature Pyramid (CFP) module to balance those factors. Based on the CFP module, we built CFPNet for real-time semantic segmentation which applied a series of dilated convolution channels to extract effective features. Experiments on Cityscapes and CamVid datasets show that the proposed CFPNet achieves an effective combination of those factors. For the Cityscapes test dataset, CFPNet achieves 70.1% class-wise mIoU with only 0.55 million parameters and 2.5 MB memory. The inference speed can reach 30 FPS on a single RTX 2080Ti GPU with a 1024x2048-pixel image.
翻译:由于对移动装置和自主驱动的需求不断增加,实时语义分割在计算机视觉中正在发挥更为重要的作用。 因此,在性能、模型大小和推论速度之间实现良好的权衡非常重要。 在本文件中,我们提议建立一个频道式地貌金字塔(CFP)模块来平衡这些因素。 根据CFP模块,我们为实时语义分割建立了CFPNet,该网络应用一系列变异变异渠道来提取有效功能。 对城市景景和CamVid数据集的实验显示,拟议的CFPNet实现了这些因素的有效组合。 对于城景测试数据集,CFPNet仅能达到70.1%的等级式MIOU,只有55万参数和2.5 MB内存。 单个RTX 2080Ti GPU和1024x2048-pixel图像的推断速度可以达到30 FPS。