Multi-label learning deals with the problem that each instance is associated with multiple labels simultaneously. Most of the existing approaches aim to improve the performance of multi-label learning by exploiting label correlations. Although the data augmentation technique is widely used in many machine learning tasks, it is still unclear whether data augmentation is helpful to multi-label learning. In this article, we propose to leverage the data augmentation technique to improve the performance of multi-label learning. Specifically, we first propose a novel data augmentation approach that performs clustering on the real examples and treats the cluster centers as virtual examples, and these virtual examples naturally embody the local label correlations and label importances. Then, motivated by the cluster assumption that examples in the same cluster should have the same label, we propose a novel regularization term to bridge the gap between the real examples and virtual examples, which can promote the local smoothness of the learning function. Extensive experimental results on a number of real-world multi-label datasets clearly demonstrate that our proposed approach outperforms the state-of-the-art counterparts.
翻译:多标签学习涉及每个实例同时与多个标签相关联的问题。 大多数现有方法都旨在通过利用标签关联来改善多标签学习的绩效。 虽然数据增强技术在许多机器学习任务中广泛使用,但仍不清楚数据增强是否有助于多标签学习。 在本篇文章中,我们提议利用数据增强技术来改善多标签学习的绩效。 具体地说,我们首先提出一种新的数据增强方法,对真实实例进行分组,并将集群中心作为虚拟实例对待,这些虚拟例子自然地体现了本地标签相关性和标签重要性。 接着,基于组群假设,即同一组群中的例子应该具有相同的标签,我们提出了一个新的正规化术语,以弥合真实实例和虚拟实例之间的差距,这可以促进本地学习功能的平稳性。关于一些真实世界多标签数据集的广泛实验结果清楚地表明,我们拟议的方法超出了最先进的对应方。