Hyperspectral image (HSI) classification aims to categorize each pixel in an HSI into a specific land cover class, which is crucial for applications such as remote sensing, environmental monitoring, and agriculture. Although deep learning-based HSI classification methods have achieved significant advancements, existing methods still rely on manually labeled data for training, which is both time-consuming and labor-intensive. To address this limitation, we introduce a novel zero-shot hyperspectral image classification framework based on CLIP (SPECIAL), aiming to eliminate the need for manual annotations. The SPECIAL framework consists of two main stages: (1) CLIP-based pseudo-label generation, and (2) noisy label learning. In the first stage, HSI is spectrally interpolated to produce RGB bands. These bands are subsequently classified using CLIP, resulting in noisy pseudo-labels that are accompanied by confidence scores. To improve the quality of these labels, we propose a scaling strategy that fuses predictions from multiple spatial scales. In the second stage, spectral information and a label refinement technique are incorporated to mitigate label noise and further enhance classification accuracy. Experimental results on three benchmark datasets demonstrate that our SPECIAL outperforms existing methods in zero-shot HSI classification, showing its potential for more practical applications. The code is available at https://github.com/LiPang/SPECIAL.
翻译:高光谱图像(HSI)分类旨在将HSI中的每个像素归类到特定的土地覆盖类别,这对于遥感、环境监测和农业等应用至关重要。尽管基于深度学习的HSI分类方法已取得显著进展,但现有方法仍依赖人工标注数据进行训练,这既耗时又费力。为克服这一局限,我们提出了一种基于CLIP的新型零样本高光谱图像分类框架(SPECIAL),旨在消除对人工标注的依赖。SPECIAL框架包含两个主要阶段:(1)基于CLIP的伪标签生成,以及(2)噪声标签学习。在第一阶段,HSI通过光谱插值生成RGB波段,随后使用CLIP对这些波段进行分类,得到带有置信度分数的噪声伪标签。为提高标签质量,我们提出了一种融合多空间尺度预测的缩放策略。在第二阶段,引入光谱信息和标签细化技术以减轻标签噪声,进一步提升分类精度。在三个基准数据集上的实验结果表明,我们的SPECIAL在零样本HSI分类中优于现有方法,展现了其在更实际应用中的潜力。代码可在https://github.com/LiPang/SPECIAL获取。