In recent years, using a deep convolutional neural network (CNN) as a feature encoder (or backbone) is the most commonly observed architectural pattern in several computer vision methods, and semantic segmentation is no exception. The two major drawbacks of this architectural pattern are: (i) the networks often fail to capture small classes such as wall, fence, pole, traffic light, traffic sign, and bicycle, which are crucial for autonomous vehicles to make accurate decisions. (ii) due to the arbitrarily increasing depth, the networks require massive labeled data and additional regularization techniques to converge and to prevent the risk of over-fitting, respectively. While regularization techniques come at minimal cost, the collection of labeled data is an expensive and laborious process. In this work, we address these two drawbacks by proposing a novel lightweight architecture named point-wise dense flow network (PDFNet). In PDFNet, we employ dense, residual, and multiple shortcut connections to allow a smooth gradient flow to all parts of the network. The extensive experiments on Cityscapes and CamVid benchmarks demonstrate that our method significantly outperforms baselines in capturing small classes and in few-data regimes. Moreover, our method achieves considerable performance in classifying out-of-the training distribution samples, evaluated on Cityscapes to KITTI dataset.
翻译:近些年来,使用深层进化神经网络(CNN)作为特征编码器(或主干)是若干计算机视觉方法中最常观察到的建筑图案,语义分割也不例外。这种建筑图案的两个主要缺点是:(一) 网络往往不能捕捉墙、栅栏、杆、交通灯、交通标志和自行车等小类,而这些小类对于自主车辆作出准确决定至关重要。 (二) 由于纵横越广,这些网络需要大量贴标签的数据和额外的正规化技术,才能凝聚并防止过度安装的风险。在采用正规化技术时成本极低,但收集贴标签数据是一个昂贵和艰苦的过程。在这项工作中,我们通过提出一个名为点-点-点-密度流网络(PDFNet)的新轻度结构来解决这两个缺陷。在PDFNet中,我们使用密集、残余和多条捷径连接使网络的各个部分都能够平稳的梯度流动。关于城市景象和Camvid基准的广泛实验表明,我们在收集小类和数字-分类方法方面,我们的方法大大超出了城市的基线。