Multi-label image classification is a fundamental but challenging task in computer vision. Over the past few decades, solutions exploring relationships between semantic labels have made great progress. However, the underlying spatial-contextual information of labels is under-exploited. To tackle this problem, a spatial-context-aware deep neural network is proposed to predict labels taking into account both semantic and spatial information. This proposed framework is evaluated on Microsoft COCO and PASCAL VOC, two widely used benchmark datasets for image multi-labelling. The results show that the proposed approach is superior to the state-of-the-art solutions on dealing with the multi-label image classification problem.
翻译:多标签图像分类是计算机愿景中一项根本性但具有挑战性的任务。 在过去几十年中,探索语义标签之间关系的解决方案取得了巨大进展。 但是,标签的基本空间-理论信息没有得到充分利用。为了解决这一问题,建议建立一个空间-具有文字意识的深神经网络,以预测标签,同时考虑到语义和空间信息。这个拟议框架在微软COCOCO和PASAL VOC(两个广泛使用的图像多标签基准数据集)上进行了评估。结果显示,拟议方法优于处理多标签图像分类问题的最新解决方案。