Despite achieving impressive progress, current multi-label image recognition (MLR) algorithms heavily depend on large-scale datasets with complete labels, making collecting large-scale datasets extremely time-consuming and labor-intensive. Training the multi-label image recognition models with partial labels (MLR-PL) is an alternative way to address this issue, in which merely some labels are known while others are unknown for each image (see Figure 1). However, current MLP-PL algorithms mainly rely on the pre-trained image classification or similarity models to generate pseudo labels for the unknown labels. Thus, they depend on a certain amount of data annotations and inevitably suffer from obvious performance drops, especially when the known label proportion is low. To address this dilemma, we propose a unified semantic-aware representation blending (SARB) that consists of two crucial modules to blend multi-granularity category-specific semantic representation across different images to transfer information of known labels to complement unknown labels. Extensive experiments on the MS-COCO, Visual Genome, and Pascal VOC 2007 datasets show that the proposed SARB consistently outperforms current state-of-the-art algorithms on all known label proportion settings. Concretely, it obtain the average mAP improvement of 1.9%, 4.5%, 1.0% on the three benchmark datasets compared with the second-best algorithm.
翻译:尽管取得了令人印象深刻的进展,但目前的多标签图像识别(MLR)算法在很大程度上依赖于带有完整标签的大型数据集,使大型数据集的收集极为耗时和劳动密集型。用部分标签(MLR-PL)对多标签图像识别模型进行培训是解决这一问题的替代方法,其中仅知道某些标签,而其它图像则不为人知(见图1),但目前的多标签图像识别(MLP-PL)算法主要依靠预先培训的图像分类或类似模型为未知标签生成假标签。因此,它们依赖于一定数量的数据说明,并且不可避免地会因为明显的性能下降而受到影响,特别是在已知标签比例低的情况下。为解决这一难题,我们提议采用统一的语义识别图像识别模型混合(SARB),其中包括两个关键模块,将多特征类别特定类的语义表达法代表,以传输已知标签信息以补充未知标签。关于MS-CO、视觉基因组和Pascal VOC 2007年数据元的大规模实验,其明显性性下降,特别是当已知的标签比例在SARB平均比例上,所有SARB的SAL-SAL-SAL-ADRILADADADRAs。