Images or videos always contain multiple objects or actions. Multi-label recognition has been witnessed to achieve pretty performance attribute to the rapid development of deep learning technologies. Recently, graph convolution network (GCN) is leveraged to boost the performance of multi-label recognition. However, what is the best way for label correlation modeling and how feature learning can be improved with label system awareness are still unclear. In this paper, we propose a label graph superimposing framework to improve the conventional GCN+CNN framework developed for multi-label recognition in the following two aspects. Firstly, we model the label correlations by superimposing label graph built from statistical co-occurrence information into the graph constructed from knowledge priors of labels, and then multi-layer graph convolutions are applied on the final superimposed graph for label embedding abstraction. Secondly, we propose to leverage embedding of the whole label system for better representation learning. In detail, lateral connections between GCN and CNN are added at shallow, middle and deep layers to inject information of label system into backbone CNN for label-awareness in the feature learning process. Extensive experiments are carried out on MS-COCO and Charades datasets, showing that our proposed solution can greatly improve the recognition performance and achieves new state-of-the-art recognition performance.
翻译:多标签图像或视频总是包含多个对象或行动。 多标签的识别被见证了与深层学习技术的快速发展相比,取得了相当的性能属性。最近,图变网络(GCN)被利用来提升多标签识别的性能。然而,标签相关模型的最佳模式以及如何通过标签系统认识来改进特征学习。在本文件中,我们提议了一个标签图叠加框架,以改进在以下两个方面为多标签识别而开发的常规GCN+CNN框架。首先,我们用从统计共生信息中叠加的标签图作为模型,在从标签知识前期创建的图表中添加标签信息,然后将多层图变图用于推进多标签识别的性能。第二,我们提议利用将整个标签系统嵌入的优势,以更好地进行代表性学习。详细来说,GCNN和CNN在浅层和深层上添加了标签信息的横向联系,将其添加到CNNC用于特征学习过程中的标签认知主干线。在MS-CO和Charade的绩效识别方面进行了广泛的实验,在MS-CO和Crows上大大地展示了我们业绩识别的新状态。