The contextual information is critical for various computer vision tasks, previous works commonly design plug-and-play modules and structural losses to effectively extract and aggregate the global context. These methods utilize fine-label to optimize the model but ignore that fine-trained features are also precious training resources, which can introduce preferable distribution to hard pixels (i.e., misclassified pixels). Inspired by contrastive learning in unsupervised paradigm, we apply the contrastive loss in a supervised manner and re-design the loss function to cast off the stereotype of unsupervised learning (e.g., imbalance of positives and negatives, confusion of anchors computing). To this end, we propose Positive-Negative Equal contrastive loss (PNE loss), which increases the latent impact of positive embedding on the anchor and treats the positive as well as negative sample pairs equally. The PNE loss can be directly plugged right into existing semantic segmentation frameworks and leads to excellent performance with neglectable extra computational costs. We utilize a number of classic segmentation methods (e.g., DeepLabV3, HRNetV2, OCRNet, UperNet) and backbone (e.g., ResNet, HRNet, Swin Transformer) to conduct comprehensive experiments and achieve state-of-the-art performance on three benchmark datasets (e.g., Cityscapes, COCO-Stuff and ADE20K). Our code will be publicly available soon.
翻译:在各种计算机愿景任务中,背景信息至关重要,以往的工作通常是设计插件和游戏模块,以及结构性损失,以有效提取和汇总全球背景。这些方法使用微贴标签优化模型,但忽视微微训练功能也是宝贵的培训资源,这可以向硬像素(如误分类像素)提供更优的分布。在未经监督的范式中,对比性学习的启发下,我们以监督的方式应用对比性损失20,并重新设计损失功能,以摆脱无监督学习的定型观念(如正负的不平衡,锚的计算混乱)。为此,我们提出积极-负的相等对比损失(PNE损失),这可以增加正嵌入锚的潜伏影响,对正和负样配一视同仁。 PNE损失可以直接插入现有的语义分解框架,并导致以可忽略的额外计算成本实现良好的业绩。我们将使用一些典型的分解方法(如深LabV3、HRNet2、OCRCRNet3、SUP-Net-Gressional-Gress、SUILO、S-Net-SUIFF、SUILA、S-Net)和SUDISLUDIS-S-Net数据库数据库数据库)。