Recognising remote sensing scene images remains challenging due to large visual-semantic discrepancies. These mainly arise due to the lack of detailed annotations that can be employed to align pixel-level representations with high-level semantic labels. As the tagging process is labour-intensive and subjective, we hereby propose a novel Multi-Granularity Canonical Appearance Pooling (MG-CAP) to automatically capture the latent ontological structure of remote sensing datasets. We design a granular framework that allows progressively cropping the input image to learn multi-grained features. For each specific granularity, we discover the canonical appearance from a set of pre-defined transformations and learn the corresponding CNN features through a maxout-based Siamese style architecture. Then, we replace the standard CNN features with Gaussian covariance matrices and adopt the proper matrix normalisations for improving the discriminative power of features. Besides, we provide a stable solution for training the eigenvalue-decomposition function (EIG) in a GPU and demonstrate the corresponding back-propagation using matrix calculus. Extensive experiments have shown that our framework can achieve promising results in public remote sensing scene datasets.
翻译:认识到遥感场景图像仍因视觉和语义差异巨大而具有挑战性,主要原因是缺乏详细的说明,无法将像素级代表与高级语义标签相匹配。由于标记过程是劳动密集型和主观性的,因此我们在此提议一个新型的多毛类气候外观集合(MG-CAP),以自动捕捉遥感数据集潜在的肿瘤结构。我们设计了一个颗粒框架,允许逐步刻出输入图像以学习多分辨特征。对于每个特定的颗粒,我们从一套预先定义的变异中发现可视像外观,并通过一个以最高值为基础的暹罗风格结构学习相应的CNN特征。然后,我们用高氏共变形矩阵取代标准CNN特征,并采用适当的矩阵正常化,以改进特征的鉴别能力。此外,我们为在GPU中培训egen值脱色功能提供了一种稳定的解决办法,并展示了对应的公共反向分析功能。我们利用卡路卢斯模型模型进行的分析实验,可以实现有希望的图像。