Few-shot image classification consists of two consecutive learning processes: 1) In the meta-learning stage, the model acquires a knowledge base from a set of training classes. 2) During meta-testing, the acquired knowledge is used to recognize unseen classes from very few examples. Inspired by the compositional representation of objects in humans, we train a neural network architecture that explicitly represents objects as a dictionary of shared components and their spatial composition. In particular, during meta-learning, we train a knowledge base that consists of a dictionary of component representations and a dictionary of component activation maps that encode common spatial activation patterns of components. The elements of both dictionaries are shared among the training classes. During meta-testing, the representation of unseen classes is learned using the component representations and the component activation maps from the knowledge base. Finally, an attention mechanism is used to strengthen those components that are most important for each category. We demonstrate the value of our interpretable compositional learning framework for a few-shot classification using miniImageNet, tieredImageNet, CIFAR-FS, and FC100, where we achieve comparable performance.
翻译:少见图像分类包括两个连续学习过程:(1) 在元学习阶段,模型从一组培训课程中获得知识基础。(2) 在元测试阶段,获得的知识用于从极少数例子中识别无形的类别。受人类物体构成的启发,我们训练神经网络结构,明确代表物体作为共有组成部分及其空间构成的字典。特别是,在元学习阶段,我们训练知识基础,包括一个组成部分表达词典和一个组成部分激活图的词典,以编码各组成部分的共同空间激活模式。两个词典的要素都由各培训班共享。在元测试阶段,利用组成部分表述和知识库的启动图来学习未见课程的代表性。最后,我们利用一种关注机制加强这些对每一类别最重要的组成部分。我们利用微型图像网、分级化信息网、CIFAR-FS和FC100来展示我们可解释的组成学习框架的价值,以便进行几分解的分类,我们在那里取得了可比较的业绩。