Scene change detection (SCD), a crucial perception task, identifies changes by comparing scenes captured at different times. SCD is challenging due to noisy changes in illumination, seasonal variations, and perspective differences across a pair of views. Deep neural network based solutions require a large quantity of annotated data which is tedious and expensive to obtain. On the other hand, transfer learning from large datasets induces domain shift. To address these challenges, we propose a novel \textit{Differencing self-supervised pretraining (DSP)} method that uses feature differencing to learn discriminatory representations corresponding to the changed regions while simultaneously tackling the noisy changes by enforcing temporal invariance across views. Our experimental results on SCD datasets demonstrate the effectiveness of our method, specifically to differences in camera viewpoints and lighting conditions. Compared against the self-supervised Barlow Twins and the standard ImageNet pretraining that uses more than a million additional labeled images, DSP can surpass it without using any additional data. Our results also demonstrate the robustness of DSP to natural corruptions, distribution shift, and learning under limited labeled data.
翻译:光谱变化检测( SCD) 是一项至关重要的认知任务, 通过比较不同时间所捕捉到的场景来辨别变化。 SCD 具有挑战性,因为光化、季节性变化和观点差异在一对观点上的变化非常吵闹。 深神经网络解决方案需要大量附加说明的数据, 这些数据既乏味又昂贵。 另一方面, 从大型数据集中转移学习会导致域变换。 为了应对这些挑战, 我们提议了一种新型的\ textit{ 差异自监督的预培训( DSP) 方法, 使用不同的特征来学习与变化区域相对应的歧视性表现, 同时通过对不同观点实施时间差异来应对这些变化。 我们在 SCD 数据集上的实验结果显示了我们的方法的有效性, 特别是相机观点和照明条件的差异。 与自监督的Barlow 双体和标准图像网络前训练相比, 使用超过100万个额外的标签图像, DSP 可以在不使用任何额外数据的情况下超越它。 我们的结果还表明 DSP 与自然腐败、 发行变化和在有限的标签数据下学习 。