Self-supervision based deep learning classification approaches have received considerable attention in academic literature. However, the performance of such methods on remote sensing imagery domains remains under-explored. In this work, we explore contrastive representation learning methods on the task of imagery-based city classification, an important problem in urban computing. We use satellite and map imagery across 2 domains, 3 million locations and more than 1500 cities. We show that self-supervised methods can build a generalizable representation from as few as 200 cities, with representations achieving over 95\% accuracy in unseen cities with minimal additional training. We also find that the performance discrepancy of such methods, when compared to supervised methods, induced by the domain discrepancy between natural imagery and abstract imagery is significant for remote sensing imagery. We compare all analysis against existing supervised models from academic literature and open-source our models for broader usage and further criticism.
翻译:以自我监督为基础的深层次学习分类方法在学术文献中受到相当重视,然而,这类遥感图像领域方法的绩效仍未得到充分探讨。在这项工作中,我们探讨了关于基于图像的城市分类任务的对比代表性学习方法,这是城市计算中的一个重要问题。我们使用卫星和地图两个领域、300万个地点和1 500多个城市的图像。我们表明,自监督方法能够从200个城市建立普遍化的代表性,在不为人知的城市实现95个以上的准确性,并经过极少的额外培训。我们还发现,与受监督的方法相比,由于自然图像和抽象图像之间的地域差异,这些方法的绩效差异对于遥感图像来说是巨大的。我们将所有分析与现有来自学术文献的监管模型和开放来源的模型进行比较,以便更广泛地使用和进一步批评。