The contextual information is critical for various computer vision tasks, previous works commonly design plug-and-play modules and structural losses to effectively extract and aggregate the global context. These methods utilize fine-label to optimize the model but ignore that fine-trained features are also precious training resources, which can introduce preferable distribution to hard pixels (i.e., misclassified pixels). Inspired by contrastive learning in unsupervised paradigm, we apply the contrastive loss in a supervised manner and re-design the loss function to cast off the stereotype of unsupervised learning (e.g., imbalance of positives and negatives, confusion of anchors computing). To this end, we propose Positive-Negative Equal contrastive loss (PNE loss), which increases the latent impact of positive embedding on the anchor and treats the positive as well as negative sample pairs equally. The PNE loss can be directly plugged right into existing semantic segmentation frameworks and leads to excellent performance with neglectable extra computational costs. We utilize a number of classic segmentation methods (e.g., DeepLabV3, OCRNet, UperNet) and backbone (e.g., ResNet, HRNet, Swin Transformer) to conduct comprehensive experiments and achieve state-of-the-art performance on two benchmark datasets (e.g., Cityscapes and COCO-Stuff). Our code will be publicly available soon.
翻译:在各种计算机愿景任务中,背景信息至关重要,以往的工作通常设计插插和游戏模块,以及结构性损失,以有效提取和汇总全球背景。这些方法使用微贴标签优化模型,但忽视微微培训功能也是宝贵的培训资源,这可以向硬像素(如误分类像素)提供更佳的分布。在未经监督的范式中,对比性学习的启发下,我们以监督的方式应用对比性损失,并重新设计损失功能,以摆脱无监督学习的定型观念(如正负的不平衡,锚的计算混乱)。为此,我们提出积极-负的相等对比损失(PNE损失),这可以增加正嵌入锚上的潜在影响,对正对正和负样配一视同仁。 PNE损失可以直接插入现有的语义分解框架,并导致以可忽略的额外计算成本实现优异的成绩。我们使用了一些典型分解方法(例如,DeepLabV3, OCRCRNet, UpperNet) 和 ResartroductionS-stal Stateal-stal State State Statal Statal-Settyal-Settystrationsetal, Settrials, Settys.