向遥感基金会模型推进直视变形器 (Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model)

Large-scale vision foundation models have made significant progress in visual tasks on natural images, with vision transformers being the primary choice due to their good scalability and representation ability. However, large-scale models in remote sensing (RS) have not yet been sufficiently explored. In this paper, we resort to plain vision transformers with about 100 million parameters and make the first attempt to propose large vision models tailored to RS tasks and investigate how such large models perform. To handle the large sizes and objects of arbitrary orientations in RS images, we propose a new rotated varied-size window attention to replace the original full attention in transformers, which can significantly reduce the computational cost and memory footprint while learning better object representation by extracting rich context from the generated diverse windows. Experiments on detection tasks show the superiority of our model over all state-of-the-art models, achieving 81.24% mAP on the DOTA-V1.0 dataset. The results of our models on downstream classification and segmentation tasks also show competitive performance compared to existing advanced methods. Further experiments show the advantages of our models in terms of computational complexity and data efficiency in transferring.

翻译：大型视觉基础模型在自然图像的视觉任务方面取得了显著进展,视觉变压器由于其可伸缩性和代表性能力而成为首要选择。然而,遥感中的大型模型尚未得到充分探讨。在本文件中,我们采用拥有约1亿个参数的普通视觉变压器,并首次试图提出适合塞族共和国任务的大型视觉变压器,并调查这种大型模型的运作情况。为了处理塞族共和国图像中任意定向的巨大尺寸和对象,我们提议采用新的旋转式不同尺寸窗口关注器,以取代变压器中最初的完全关注器,这可以大大降低计算成本和记忆足迹,同时通过从生成的不同窗口中提取丰富的环境学习更好的对象表示法。关于探测任务的实验显示,我们的模型优于所有最先进的模型,在DATA-V1.0数据集上实现了81.24%的 mAP。我们关于下游分类和分化的模型的结果也显示了与现有先进方法相比的竞争性业绩。进一步实验显示,我们的模型在计算复杂度和数据传输效率方面具有优势。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

最新《Transformers模型》教程，64页ppt

专知会员服务

320+阅读 · 2020年11月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日