Compared with multi-class classification, multi-label classification that contains more than one class is more suitable in real life scenarios. Obtaining fully labeled high-quality datasets for multi-label classification problems, however, is extremely expensive, and sometimes even infeasible, with respect to annotation efforts, especially when the label spaces are too large. This motivates the research on partial-label classification, where only a limited number of labels are annotated and the others are missing. To address this problem, we first propose a pseudo-label based approach to reduce the cost of annotation without bringing additional complexity to the existing classification networks. Then we quantitatively study the impact of missing labels on the performance of classifier. Furthermore, by designing a novel loss function, we are able to relax the requirement that each instance must contain at least one positive label, which is commonly used in most existing approaches. Through comprehensive experiments on three large-scale multi-label image datasets, i.e. MS-COCO, NUS-WIDE, and Pascal VOC12, we show that our method can handle the imbalance between positive labels and negative labels, while still outperforming existing missing-label learning approaches in most cases, and in some cases even approaches with fully labeled datasets.
翻译:与多级分类相比,包含一个以上等级的多标签分类在现实生活中更为合适。然而,就多标签分类问题获取贴上充分标签的高质量数据集,对于批注工作而言,成本非常昂贵,有时甚至不可行,特别是标签空间太大时,成本极高,有时甚至不可行。这促使对部分标签分类的研究,因为只有数量有限的标签加注,而其他标签则缺失。为了解决这一问题,我们首先提议以假标签为基础的方法,降低批注成本,而不会对现有分类网络带来更多复杂性。然后,我们量化研究缺失标签对分类者性能的影响。此外,通过设计新的损失功能,我们可以放松要求每个实例必须包含至少一个积极标签的要求,这在多数现有办法中通常使用。通过对三大大型多标签图像数据集的全面实验,即MS-CO、NUS-WIDE和Pascal VOC12,我们展示了我们的方法,在最积极的标签和负面标签做法中,甚至可以处理一些现有标签和负面标签做法之间的不平衡现象,与此同时,我们仍用现有标签方法来充分学习现有案例。