Collecting labeled data for the task of semantic segmentation is expensive and time-consuming, as it requires dense pixel-level annotations. While recent Convolutional Neural Network (CNN) based semantic segmentation approaches have achieved impressive results by using large amounts of labeled training data, their performance drops significantly as the amount of labeled data decreases. This happens because deep CNNs trained with the de facto cross-entropy loss can easily overfit to small amounts of labeled data. To address this issue, we propose a simple and effective contrastive learning-based training strategy in which we first pretrain the network using a pixel-wise, label-based contrastive loss, and then fine-tune it using the cross-entropy loss. This approach increases intra-class compactness and inter-class separability, thereby resulting in a better pixel classifier. We demonstrate the effectiveness of the proposed training strategy using the Cityscapes and PASCAL VOC 2012 segmentation datasets. Our results show that pretraining with the proposed contrastive loss results in large performance gains (more than 20% absolute improvement in some settings) when the amount of labeled data is limited. In many settings, the proposed contrastive pretraining strategy, which does not use any additional data, is able to match or outperform the widely-used ImageNet pretraining strategy that uses more than a million additional labeled images.
翻译:收集用于语义分解任务的标签数据既昂贵又费时,因为这需要密集的像素级说明。虽然最近以进化神经网络(CNN)为基础的语义分解方法通过使用大量标签培训数据取得了令人印象深刻的成果,但随着标签数据减少,其性能却显著下降。这是因为深层次CNN在事实上交叉机能损失方面受过培训的跨大西洋分解器很容易与少量的标签数据相匹配。为了解决这一问题,我们提出了一个简单而有效的对比学习培训战略,其中我们首先使用像素一样的、基于标签的对比性损失,然后使用跨热带损失对网络进行微调。这一方法提高了等级内部的紧凑性和阶级间可变性,从而导致更好的像素分解器。我们用城市景和PASAL VOC 2012 分解数据集展示了拟议培训战略的实效。我们的结果显示,我们首先对拟议的对比性损失进行预先培训,首先使用大量性能收益(超过20%的基于标签的对比性损失),然后使用某些类内部缩缩缩图象战略,因此使用更多的数据前的比前的比其他的升级要多得多。