In this work, we develop methods for few-shot image classification from a new perspective of optimal matching between image regions. We employ the Earth Mover's Distance (EMD) as a metric to compute a structural distance between dense image representations to determine image relevance. The EMD generates the optimal matching flows between structural elements that have the minimum matching cost, which is used to calculate the image distance for classification. To generate the important weights of elements in the EMD formulation, we design a cross-reference mechanism, which can effectively alleviate the adverse impact caused by the cluttered background and large intra-class appearance variations. To implement k-shot classification, we propose to learn a structured fully connected layer that can directly classify dense image representations with the EMD. Based on the implicit function theorem, the EMD can be inserted as a layer into the network for end-to-end training. Our extensive experiments validate the effectiveness of our algorithm which outperforms state-of-the-art methods by a significant margin on five widely used few-shot classification benchmarks, namely, miniImageNet, tieredImageNet, Fewshot-CIFAR100 (FC100), Caltech-UCSD Birds-200-2011 (CUB), and CIFAR-FewShot (CIFAR-FS). We also demonstrate the effectiveness of our method on the image retrieval task in our experiments.
翻译:在本文中,我们从图像区域之间的最优匹配的新视角来发展少样本图像分类的方法。我们采用Earth Mover's Distance(EMD)作为度量,计算密集图像表示之间的结构距离以确定图像的相关性。EMD生成最小匹配成本的结构元素之间的最优匹配流,用于计算分类的图像距离。为了生成EMD公式中元素的重要权重,我们设计了一个交叉引用机制,可以有效地缓解杂乱背景和大的类内外观变化所造成的不良影响。为了实现k-shot分类,我们提出了学习结构化全连接层的方法,以使用EMD直接分类密集图像表示。基于隐函数定理,EMD可以插入网络作为一层进行端到端的训练。我们广泛的实验验证了我们算法的有效性,其在五个广泛使用的少样本分类基准上(即miniImageNet,tieredImageNet,Fewshot-CIFAR100(FC100),Caltech-UCSD Birds-200-2011(CUB)和CIFAR-FewShot(CIFAR-FS))均优于最先进的方法。我们在实验中也证明了我们方法在图像检索任务中的有效性。