SSL4Eco：面向生态学地理空间基础模型的全球季节性数据集 (SSL4Eco: A Global Seasonal Dataset for Geospatial Foundation Models in Ecology)

With the exacerbation of the biodiversity and climate crises, macroecological pursuits such as global biodiversity mapping become more urgent. Remote sensing offers a wealth of Earth observation data for ecological studies, but the scarcity of labeled datasets remains a major challenge. Recently, self-supervised learning has enabled learning representations from unlabeled data, triggering the development of pretrained geospatial models with generalizable features. However, these models are often trained on datasets biased toward areas of high human activity, leaving entire ecological regions underrepresented. Additionally, while some datasets attempt to address seasonality through multi-date imagery, they typically follow calendar seasons rather than local phenological cycles. To better capture vegetation seasonality at a global scale, we propose a simple phenology-informed sampling strategy and introduce corresponding SSL4Eco, a multi-date Sentinel-2 dataset, on which we train an existing model with a season-contrastive objective. We compare representations learned from SSL4Eco against other datasets on diverse ecological downstream tasks and demonstrate that our straightforward sampling method consistently improves representation quality, highlighting the importance of dataset construction. The model pretrained on SSL4Eco reaches state of the art performance on 7 out of 8 downstream tasks spanning (multi-label) classification and regression. We release our code, data, and model weights to support macroecological and computer vision research at https://github.com/PlekhanovaElena/ssl4eco.

翻译：随着生物多样性与气候危机的加剧，全球生物多样性制图等宏观生态学任务变得愈发紧迫。遥感技术为生态研究提供了丰富的地球观测数据，但标记数据的稀缺性仍是主要挑战。近年来，自监督学习使得从无标记数据中学习表征成为可能，推动了具有可泛化特征的预训练地理空间模型的发展。然而，这些模型通常在偏向人类活动密集区域的数据集上进行训练，导致许多完整生态区域代表性不足。此外，尽管部分数据集尝试通过多时相影像处理季节性，但它们通常遵循日历季节而非本地物候周期。为在全球尺度上更好地捕捉植被季节性，我们提出了一种简单的物候感知采样策略，并引入了相应的多时相Sentinel-2数据集SSL4Eco。基于该数据集，我们采用季节对比目标对现有模型进行训练。通过在多样化的生态下游任务中将SSL4Eco学习到的表征与其他数据集进行对比，我们证明这种简洁的采样方法能持续提升表征质量，凸显了数据集构建的重要性。在SSL4Eco上预训练的模型在涵盖（多标签）分类与回归的8项下游任务中，有7项达到了最先进的性能水平。我们在https://github.com/PlekhanovaElena/ssl4eco 发布代码、数据及模型权重，以支持宏观生态学与计算机视觉研究。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日