语义分解的对比性训练前训练对吵闹的阳性对夫妇来说是很强的 (Contrastive pretraining for semantic segmentation is robust to noisy positive pairs)

from arxiv, This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. Compared to the previous version, large-scale changes were made to make the paper easier to understand for people less familiar with contrastive learning and to make it easier to follow certain arguments. 10 pages, 9 figures

Domain-specific variants of contrastive learning can construct positive pairs from two distinct in-domain images, while traditional methods just augment the same image twice. For example, we can form a positive pair from two satellite images showing the same location at different times. Ideally, this teaches the model to ignore changes caused by seasons, weather conditions or image acquisition artifacts. However, unlike in traditional contrastive methods, this can result in undesired positive pairs, since we form them without human supervision. For example, a positive pair might consist of one image before a disaster and one after. This could teach the model to ignore the differences between intact and damaged buildings, which might be what we want to detect in the downstream task. Similar to false negative pairs, this could impede model performance. Crucially, in this setting only parts of the images differ in relevant ways, while other parts remain similar. Surprisingly, we find that downstream semantic segmentation is either robust to such badly matched pairs or even benefits from them. The experiments are conducted on the remote sensing dataset xBD, and a synthetic segmentation dataset for which we have full control over the pairing conditions. As a result, practitioners can use these domain-specific contrastive methods without having to filter their positive pairs beforehand, or might even be encouraged to purposefully include such pairs in their pretraining dataset.

翻译：对比性学习的域别变量可以从两种不同的内部图像中构建正对, 而传统方法只是将相同的图像放大两次。例如, 我们可以从不同时间显示相同位置的两张卫星图像中形成正对。理想的是, 这让模型忽略季节、天气条件或图像获取工艺品造成的变化。但是, 与传统的对比性方法不同, 这可能导致不理想的正对, 因为没有人类监督, 我们形成这些图像。例如, 正对可能是在灾难之前和之后的一张图像。这可以教给模型忽略完整和受损建筑之间的差异, 这可能是我们在下游任务中想要检测的。类似假负对, 这可能会妨碍模型性能。很明显, 在这种设置中, 图像中只有部分在相关方式上有所不同, 而其它部分则保持相似。令人惊讶的是, 我们发现下游的语系分化区分化区分化可能是强的, 与如此差的对配甚至从中得益。实验是在遥感数据集 xBD上进行, 合成分区数据集成数据集成, 甚至我们有完全的分化的分解方法, 能够对正比, 。