Semi-supervised learning (SSL) is a common approach to learning predictive models using not only labeled examples, but also unlabeled examples. While SSL for the simple tasks of classification and regression has received a lot of attention from the research community, this is not properly investigated for complex prediction tasks with structurally dependent variables. This is the case of multi-label classification and hierarchical multi-label classification tasks, which may require additional information, possibly coming from the underlying distribution in the descriptive space provided by unlabeled examples, to better face the challenging task of predicting simultaneously multiple class labels. In this paper, we investigate this aspect and propose a (hierarchical) multi-label classification method based on semi-supervised learning of predictive clustering trees. We also extend the method towards ensemble learning and propose a method based on the random forest approach. Extensive experimental evaluation conducted on 23 datasets shows significant advantages of the proposed method and its extension with respect to their supervised counterparts. Moreover, the method preserves interpretability and reduces the time complexity of classical tree-based models.
翻译:半监督的学习(SSL)是一种常见的学习预测模型的方法,它不仅使用贴标签的例子,而且还使用未贴标签的例子。虽然用于简单分类和回归任务的SSL得到了研究界的极大关注,但对于具有结构依赖变量的复杂预测任务没有进行适当的调查。这是多标签分类和等级性多标签分类任务的情况,可能需要额外信息,可能来自未贴标签的例子所提供的描述空间的基本分布,以更好地面对同时预测多个类标签的艰巨任务。在本文中,我们调查这一方面,并提议一种(等级性的)多标签分类方法,其依据是对预测性树群树进行半监督的学习。我们还将这种方法推广到共同学习,并提议一种基于随机森林方法的方法。对23个数据集进行的广泛实验性评估表明,拟议方法及其延伸与其监督的对应方相比,具有重大优势。此外,这种方法保留了可解释性,降低了传统树型模型的时间复杂性。