This work addresses the task of multilabel image classification. Inspired by the great success from deep convolutional neural networks (CNNs) for single-label visual-semantic embedding, we exploit extending these models for multilabel images. Specifically, we propose an image-dependent ranking model, which returns a ranked list of labels according to its relevance to the input image. In contrast to conventional CNN models that learn an image representation (i.e. the image embedding vector), the developed model learns a mapping (i.e. a transformation matrix) from an image in an attempt to differentiate between its relevant and irrelevant labels. Despite the conceptual simplicity of our approach, experimental results on a public benchmark dataset demonstrate that the proposed model achieves state-of-the-art performance while using fewer training images than other multilabel classification methods.
翻译:这项工作涉及多标签图像分类的任务。 深层神经神经网络(CNNs)在单标签视觉和语义嵌入方面的巨大成功激励下,我们利用这些模型来扩展多标签图像。 具体地说,我们提议了一个依靠图像的排名模型,根据它与输入图像的相关性,返回一个排名标签列表。 与传统的CNN模型相比,它学习图像表示(即图像嵌入矢量),发达模型从图像中学习了绘图(即转换矩阵),试图区分相关和无关的标签。 尽管我们的方法在概念上很简单,但公共基准数据集的实验结果表明,拟议的模型在使用比其他多标签分类方法更少的培训图像的同时,实现了最先进的性能。