CARD: 配有高效分类意识的常规解码器的语义分解 (CARD: Semantic Segmentation with Efficient Class-Aware Regularized Decoder)

Semantic segmentation has recently achieved notable advances by exploiting "class-level" contextual information during learning. However, these approaches simply concatenate class-level information to pixel features to boost the pixel representation learning, which cannot fully utilize intra-class and inter-class contextual information. Moreover, these approaches learn soft class centers based on coarse mask prediction, which is prone to error accumulation. To better exploit class level information, we propose a universal Class-Aware Regularization (CAR) approach to optimize the intra-class variance and inter-class distance during feature learning, motivated by the fact that humans can recognize an object by itself no matter which other objects it appears with. Moreover, we design a dedicated decoder for CAR (CARD), which consists of a novel spatial token mixer and an upsampling module, to maximize its gain for existing baselines while being highly efficient in terms of computational cost. Specifically, CAR consists of three novel loss functions. The first loss function encourages more compact class representations within each class, the second directly maximizes the distance between different class centers, and the third further pushes the distance between inter-class centers and pixels. Furthermore, the class center in our approach is directly generated from ground truth instead of from the error-prone coarse prediction. CAR can be directly applied to most existing segmentation models during training, and can largely improve their accuracy at no additional inference overhead. Extensive experiments and ablation studies conducted on multiple benchmark datasets demonstrate that the proposed CAR can boost the accuracy of all baseline models by up to 2.23% mIOU with superior generalization ability. CARD outperforms SOTA approaches on multiple benchmarks with a highly efficient architecture.

翻译：通过在学习过程中利用“类级级”背景信息,语义分解最近取得了显著的进展。然而,这些方法只是将类级级信息与像素特性相融合,以提升像素代表学习,这不能充分利用类内和类间背景信息。此外,这些方法学习基于粗化掩码预测的软类中心,这很容易造成误差积累。为了更好地利用类级信息,我们提议一种通用类级知识常规化(CAR)方法,在特征学习期间优化类内差异和类际距离,其动机是,人类可以自行识别一个对象,而无需多少其他对象。此外,我们为CARD设计了一个专门的解码器(CARD),该解码器是一个全新的空间代号混合器和一个升级模块,以便在计算成本方面高度高效的情况下,最大限度地增加现有基线的收益。我们CARC包含三个新的损失功能。第一个损失函数鼓励在每类内建立更紧凑的类级级表达方式,第二个直接优化了不同类级中心之间的距离,第三个是进一步推高端距离,从最高级的级中心和最高级的级级级级级级级级级结构之间的距离,可以直接地展示一个SOUILLLLLL, 。此外的研算中,可以增加一个C。此外,可以提高的SLLLLLLLL。此外,可以增加一个S,可以提高所有级和直接地研算。