Large pre-trained models, also known as foundation models (FMs), are trained in a task-agnostic manner on large-scale data and can be adapted to a wide range of downstream tasks by fine-tuning, few-shot, or even zero-shot learning. Despite their successes in language and vision tasks, we have yet seen an attempt to develop foundation models for geospatial artificial intelligence (GeoAI). In this work, we explore the promises and challenges of developing multimodal foundation models for GeoAI. We first investigate the potential of many existing FMs by testing their performances on seven tasks across multiple geospatial subdomains including Geospatial Semantics, Health Geography, Urban Geography, and Remote Sensing. Our results indicate that on several geospatial tasks that only involve text modality such as toponym recognition, location description recognition, and US state-level/county-level dementia time series forecasting, these task-agnostic LLMs can outperform task-specific fully-supervised models in a zero-shot or few-shot learning setting. However, on other geospatial tasks, especially tasks that involve multiple data modalities (e.g., POI-based urban function classification, street view image-based urban noise intensity classification, and remote sensing image scene classification), existing foundation models still underperform task-specific models. Based on these observations, we propose that one of the major challenges of developing a FM for GeoAI is to address the multimodality nature of geospatial tasks. After discussing the distinct challenges of each geospatial data modality, we suggest the possibility of a multimodal foundation model which can reason over various types of geospatial data through geospatial alignments. We conclude this paper by discussing the unique risks and challenges to develop such a model for GeoAI.
翻译:大型预先训练的模型,也称为基础模型(FMs),是以任务无关的方式在大规模数据上训练的,可通过微调、少样本或甚至零样本学习来适用于各种下游任务。尽管它们在语言和视觉任务中取得了成功,但我们还没有看到开发基础模型用于地理空间人工智能(GeoAI)的尝试。在这项工作中,我们探讨了开发多模态基础模型用于GeoAI的承诺和挑战。我们首先通过测试它们在多个地理空间子域上的七项任务的表现,包括地理语义、健康地理、城市地理和遥感,来研究许多现有FMs的潜力。我们的结果表明,在一些仅涉及文本模态的地理空间任务中,如地名识别、位置描述识别和美国州级/县级痴呆症时间序列预测,这些任务无关的LLMs可以在零样本或少样本学习环境中优于任务特定的全监督模型。然而,在其他地理空间任务中,特别是涉及多个数据模态的任务(例如,基于POI的城市功能分类、基于街景图像的城市噪声强度分类和遥感图像场景分类),现有的基础模型仍然不如任务特定模型表现。基于这些观察结果,我们提出了一个开发GeoAI FM的主要挑战之一是解决GeoAI多模态特性的可能性。在讨论了每个地理空间数据模态的不同挑战后,我们建议可能性通过地理空间对准来推理各种地理空间数据类型的多模态基础模型。我们通过讨论为GeoAI开发这种模型的独特风险和挑战来结束本文。