Multi-Label Image Classification (MLIC) aims to predict a set of labels that present in an image. The key to deal with such problem is to mine the associations between image contents and labels, and further obtain the correct assignments between images and their labels. In this paper, we treat each image as a bag of instances, and reformulate the task of MLIC as an instance-label matching selection problem. To model such problem, we propose a novel deep learning framework named Graph Matching based Multi-Label Image Classification (GM-MLIC), where Graph Matching (GM) scheme is introduced owing to its excellent capability of excavating the instance and label relationship. Specifically, we first construct an instance spatial graph and a label semantic graph respectively, and then incorporate them into a constructed assignment graph by connecting each instance to all labels. Subsequently, the graph network block is adopted to aggregate and update all nodes and edges state on the assignment graph to form structured representations for each instance and label. Our network finally derives a prediction score for each instance-label correspondence and optimizes such correspondence with a weighted cross-entropy loss. Extensive experiments conducted on various image datasets demonstrate the superiority of our proposed method.
翻译:多标签图像分类( MLIC) 旨在预测在图像中显示的一组标签。 解决这一问题的关键在于清除图像内容和标签之间的关联, 并进一步获得图像及其标签之间的正确分配。 在本文中, 我们将每张图像视为一包实例, 并重塑 MLIC 的任务为实例标签匹配选择问题 。 为了模拟此类问题, 我们提议了一个新型深层次学习框架, 名为基于多标签图像的图表匹配分类( GM- MLIC ), 其中引入了图形匹配( GM) 方案, 因为它具有挖掘实例和标签关系的极好能力。 具体地说, 我们首先分别构建一个实例空间图和标签语义图, 然后通过将每个实例连接到所有标签, 把它们纳入一个构建的指定图中。 随后, 图形网络块被采用来汇总和更新指派图上的所有节点和边缘状态, 以形成每个实例和标签的结构性表达方式。 我们的网络最终为每个实例标签通信提供预测分, 并优化这种对应与加权交叉损失的对比。 在各种图像上进行广泛的高级实验。