Multi-label classification is a challenging task, particularly in domains where the number of labels to be predicted is large. Deep neural networks are often effective at multi-label classification of images and textual data. When dealing with tabular data, however, conventional machine learning algorithms, such as tree ensembles, appear to outperform competition. Random forest, being a popular ensemble algorithm, has found use in a wide range of real-world problems. Such problems include fraud detection in the financial domain, crime hotspot detection in the legal sector, and in the biomedical field, disease probability prediction when patient records are accessible. Since they have an impact on people's lives, these domains usually require decision-making systems to be explainable. Random Forest falls short on this property, especially when a large number of tree predictors are used. This issue was addressed in a recent research named LionForests, regarding single label classification and regression. In this work, we adapt this technique to multi-label classification problems, by employing three different strategies regarding the labels that the explanation covers. Finally, we provide a set of qualitative and quantitative experiments to assess the efficacy of this approach.
翻译:多标签分类是一项具有挑战性的任务,特别是在要预测的标签数量众多的领域。深神经网络在图像和文本数据的多标签分类方面往往有效。然而,在处理表格数据时,传统机器学习算法,如树群等,似乎胜过竞争。随机森林是一种流行的混合算法,在广泛的现实世界问题中被使用。这类问题包括在金融领域发现欺诈,在法律部门和生物医学领域发现犯罪热点,在病人记录可以查阅时进行疾病概率预测。由于这些领域对人们的生活有影响,因此通常要求决策系统可以解释。随机森林在这种属性上存在缺陷,特别是在使用大量树预测器时。最近一项名为LionForests的研究在单一标签分类和回归方面处理了这个问题。在这项工作中,我们采用三种不同的策略来评估这一方法的功效。