It is well known that semantic segmentation neural networks (SSNNs) produce dense segmentation maps to resolve the objects' boundaries while restrict the prediction on down-sampled grids to alleviate the computational cost. A striking balance between the accuracy and the training cost of the SSNNs such as U-Net exists. We propose a spectral analysis to investigate the correlations among the resolution of the down sampled grid, the loss function and the accuracy of the SSNNs. By analyzing the network back-propagation process in frequency domain, we discover that the traditional loss function, cross-entropy, and the key features of CNN are mainly affected by the low-frequency components of segmentation labels. Our discoveries can be applied to SSNNs in several ways including (i) determining an efficient low resolution grid for resolving the segmentation maps (ii) pruning the networks by truncating the high frequency decoder features for saving computation costs, and (iii) using block-wise weak annotation for saving the labeling time. Experimental results shown in this paper agree with our spectral analysis for the networks such as DeepLab V3+ and Deep Aggregation Net (DAN).
翻译:暂无翻译