Traditional fine-grained image classification typically relies on large-scale training samples with annotated ground-truth. However, some sub-categories may have few available samples in real-world applications. In this paper, we propose a novel few-shot fine-grained image classification network (FicNet) using multi-frequency Neighborhood (MFN) and double-cross modulation (DCM). Module MFN is adopted to capture the information in spatial domain and frequency domain. Then, the self-similarity and multi-frequency components are extracted to produce multi-frequency structural representation. DCM employs bi-crisscross component and double 3D cross-attention components to modulate the embedding process by considering global context information and subtle relationship between categories, respectively. The comprehensive experiments on three fine-grained benchmark datasets for two few-shot tasks verify that FicNet has excellent performance compared to the state-of-the-art methods. Especially, the experiments on two datasets, "Caltech-UCSD Birds" and "Stanford Cars", can obtain classification accuracy 93.17\% and 95.36\%, respectively. They are even higher than that the general fine-grained image classification methods can achieve.
翻译:传统微细微图像分类通常依赖于具有附加说明的地面真实性的大型培训样本,然而,一些子类可能很少在现实世界应用中有可用的样本。在本文件中,我们建议采用多频邻里和双向调制(DCM)的微小细微图像分类网络(FicNet),使用多频邻里和双向调制(MFN)和双向调制(DCM)进行新型的微小微细微微图像分类网络(FicNet)。采用模组最惠国来捕捉空间域和频域域的信息。然后,提取自差和多频组件,以产生多频结构代表。DCM采用双螺旋交叉组件和双三维交叉注意组件,通过考虑全球背景信息和各类别之间的微妙关系来调整嵌入过程。关于三种微小频基准数据集的全面实验证实FicNet与最新方法相比表现良好。 特别是,两个数据集的实验“Caltech-UCSDBirds”和“Stanford freg Cars”可以分别获得准确性93.17和95.36。