Currently, under supervised learning, a model pretrained by a large-scale nature scene dataset and then fine-tuned on a few specific task labeling data is the paradigm that has dominated the knowledge transfer learning. It has reached the status of consensus solution for task-aware model training in remote sensing domain (RSD). Unfortunately, due to different categories of imaging data and stiff challenges of data annotation, there is not a large enough and uniform remote sensing dataset to support large-scale pretraining in RSD. Moreover, pretraining models on large-scale nature scene datasets by supervised learning and then directly fine-tuning on diverse downstream tasks seems to be a crude method, which is easily affected by inevitable labeling noise, severe domain gaps and task-aware discrepancies. Thus, in this paper, considering the self-supervised pretraining and powerful vision transformer (ViT) architecture, a concise and effective knowledge transfer learning strategy called ConSecutive PreTraining (CSPT) is proposed based on the idea of not stopping pretraining in natural language processing (NLP), which can gradually bridge the domain gap and transfer knowledge from the nature scene domain to the RSD. The proposed CSPT also can release the huge potential of unlabeled data for task-aware model training. Finally, extensive experiments are carried out on twelve datasets in RSD involving three types of downstream tasks (e.g., scene classification, object detection and land cover classification) and two types of imaging data (e.g., optical and SAR). The results show that by utilizing the proposed CSPT for task-aware model training, almost all downstream tasks in RSD can outperform the previous method of supervised pretraining-then-fine-tuning and even surpass the state-of-the-art (SOTA) performance without any expensive labeling consumption and careful model design.
翻译:目前,在受监督的学习中,一个模型先经过大规模自然现场数据集培训,然后对少数具体任务标签数据进行微调,这是指导知识转让学习的范例,已经达到遥感领域任务认知模型培训的共识解决方案。 不幸的是,由于成像数据的不同类别和数据注释的艰巨挑战,没有足够和统一的遥感数据集来支持难民身份确定方面的大规模预先培训。 此外,一个大规模自然现场数据集预培训模型,先通过受监督的学习,然后直接微调各种下游任务,似乎是一种粗糙的方法,很容易受到不可避免的标注噪音、严重领域差距和任务认知差异的影响。 因此,在本文中,考虑到自我监督的预培训前和强力的视觉变异器(Viet)结构,一个简明而有效的知识转移学习战略,称为Conseutivefretretre(CPT),根据不停止对自然语言分类进行预培训的想法(NLP),这可以逐步缩小域差距,并将数据从自然域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域域