The performance of deep learning models in remote sensing (RS) strongly depends on the availability of high-quality labeled data. However, collecting large-scale annotations is costly and time-consuming, while vast amounts of unlabeled imagery remain underutilized. To address this challenge, we propose a Hierarchical Semi-Supervised Active Learning (HSSAL) framework that integrates semi-supervised learning (SSL) and a novel hierarchical active learning (HAL) in a closed iterative loop. In each iteration, SSL refines the model using both labeled data through supervised learning and unlabeled data via weak-to-strong self-training, improving feature representation and uncertainty estimation. Guided by the refined representations and uncertainty cues of unlabeled samples, HAL then conducts sample querying through a progressive clustering strategy, selecting the most informative instances that jointly satisfy the criteria of scalability, diversity, and uncertainty. This hierarchical process ensures both efficiency and representativeness in sample selection. Extensive experiments on three benchmark RS scene classification datasets, including UCM, AID, and NWPU-RESISC45, demonstrate that HSSAL consistently outperforms SSL- or AL-only baselines. Remarkably, with only 8%, 4%, and 2% labeled training data on UCM, AID, and NWPU-RESISC45, respectively, HSSAL achieves over 95% of fully-supervised accuracy, highlighting its superior label efficiency through informativeness exploitation of unlabeled data. Our code will be publicly available.
翻译:深度学习模型在遥感领域的性能高度依赖于高质量标注数据的可用性。然而,大规模标注数据的收集成本高昂且耗时,而海量未标注影像数据仍未被充分利用。为应对这一挑战,我们提出了一种分层半监督主动学习框架,该框架将半监督学习与新型分层主动学习集成于一个封闭的迭代循环中。在每次迭代中,半监督学习通过监督学习利用标注数据,并借助弱监督到强监督的自训练方法利用未标注数据来优化模型,从而提升特征表示能力与不确定性估计精度。基于优化后的特征表示及未标注样本的不确定性线索,分层主动学习通过渐进式聚类策略执行样本查询,筛选出同时满足可扩展性、多样性与不确定性准则的最具信息量的样本。这种分层机制确保了样本选择的高效性与代表性。在UCM、AID和NWPU-RESISC45三个遥感场景分类基准数据集上的大量实验表明,HSSAL始终优于仅使用半监督学习或主动学习的基线方法。值得注意的是,在UCM、AID和NWPU-RESISC45数据集上仅使用8%、4%和2%的标注训练数据时,HSSAL即可达到超过95%的全监督模型准确率,这凸显了其通过挖掘未标注数据信息价值所实现的卓越标注效率。我们的代码将公开提供。