Remote sensing (RS) images contain numerous objects of different scales, which poses significant challenges for the RS image change captioning (RSICC) task to identify visual changes of interest in complex scenes and describe them via language. However, current methods still have some weaknesses in sufficiently extracting and utilizing multi-scale information. In this paper, we propose a progressive scale-aware network (PSNet) to address the problem. PSNet is a pure Transformer-based model. To sufficiently extract multi-scale visual features, multiple progressive difference perception (PDP) layers are stacked to progressively exploit the differencing features of bitemporal features. To sufficiently utilize the extracted multi-scale features for captioning, we propose a scale-aware reinforcement (SR) module and combine it with the Transformer decoding layer to progressively utilize the features from different PDP layers. Experiments show that the PDP layer and SR module are effective and our PSNet outperforms previous methods.
翻译:遥感(RS)图像包含许多不同尺度的物体,这对RSS图像变化说明(RSICC)查明复杂场景中令人感兴趣的视觉变化并以语言描述这些变化的任务提出了重大挑战,然而,目前的方法在充分提取和利用多尺度信息方面仍有一些弱点。我们在本文件中提议建立一个渐进规模认知网络(PSNet)来解决这个问题。PSNet是一个纯的基于变异器的模型。为了充分提取多尺度的视觉特征,将多个渐进式差异感知层堆叠在一起,以逐步利用咬口特征的差别特征。为了充分利用提取出来的多尺度特征,我们提议了一个规模增强模块,并将其与变换器解码层结合起来,以逐步利用不同PDP层的特征。实验表明,PDP层和SR模块是有效的,我们的PSNet超越了以往的方法。</s>