The proposed method in this paper proposes an end-to-end unsupervised semantic segmentation architecture DMSA based on four loss functions. The framework uses Atrous Spatial Pyramid Pooling (ASPP) module to enhance feature extraction. At the same time, a dynamic dilation strategy is designed to better capture multi-scale context information. Secondly, a Pixel-Adaptive Refinement (PAR) module is introduced, which can adaptively refine the initial pseudo labels after feature fusion to obtain high quality pseudo labels. Experiments show that the proposed DSMA framework is superior to the existing methods on the saliency dataset. On the COCO 80 dataset, the MIoU is improved by 2.0, and the accuracy is improved by 5.39. On the Pascal VOC 2012 Augmented dataset, the MIoU is improved by 4.9, and the accuracy is improved by 3.4. In addition, the convergence speed of the model is also greatly improved after the introduction of the PAR module.
翻译:本文中的拟议方法基于四个损失功能,提出了终端到终端不受监督的语义分割结构DMASA。框架使用Atrom空间金字池组合模块(ASPP)加强地貌提取。与此同时,设计了一个动态放大战略,以更好地捕捉多尺度背景信息。第二,引入了一个像素-成形改进模块(PAR),该模块可以在特性聚合后适应性地改进初始假标签,以获得高质量的假标签。实验显示,拟议的DSMA框架优于显著数据集的现有方法。在COCO 80数据集中,MIOU改进了2.0,精确度改进了5.39。在Pascal VOC 2012 增强的数据集中,MIOU改进了4.9,精确度改进了3.4。此外,在引入PAR模块后,该模型的趋同速度也大大改进。</s>