In this paper, we propose Global Context Convolutional Network (GCCN) for visual recognition. GCCN computes global features representing contextual information across image patches. These global contextual features are defined as local maxima pixels with high visual sharpness in each patch. These features are then concatenated and utilised to augment the convolutional features. The learnt feature vector is normalised using the global context features using Frobenius norm. This straightforward approach achieves high accuracy in compassion to the state-of-the-art methods with 94.6% and 95.41% on CIFAR-10 and STL-10 datasets, respectively. To explore potential impact of GCCN on other visual representation tasks, we implemented GCCN as a based model to few-shot image classification. We learn metric distances between the augmented feature vectors and their prototypes representations, similar to Prototypical and Matching Networks. GCCN outperforms state-of-the-art few-shot learning methods achieving 99.9%, 84.8% and 80.74% on Omniglot, MiniImageNet and CUB-200, respectively. GCCN has significantly improved on the accuracy of state-of-the-art prototypical and matching networks by up to 30% in different few-shot learning scenarios.
翻译:在本文中,我们提议使用Frobenius 规范,将全球背景变异网络(GCCN) 进行视觉识别。GCCN 计算全球特征,代表图像补丁之间的背景信息。这些全球背景特征被定义为本地最大象素,每个补丁的视觉清晰度高。这些特征随后被混为一体,用于增强变异特征。利用Frobenius 规范,利用全球背景特征将所学的特性矢量标准化。这种直截了当的方法在对最新方法的同情中达到了很高的精确度,分别达到94.6%和95.41%的CIFAR-10和STL-10数据集。为探索GCCN对其他视觉表述任务的潜在影响,我们实施了GCCN,作为几发图像分类的基础模型。我们学习了增强的特性矢量与其原型显示之间的量距离,类似于Protogend和匹配网络。 GCCN 在Omniglot、MiniImageNet和CUB-10-200等数据集中,这种直观学习方法达到99.9%、84.8%和80.74%的高度精确度。在OniglotageNet和CUB-Com-Profram-Profromat-pal 上,通过30个网络大大改进了不同的精确度。