Multi-label classification (MLC) has recently received increasing interest from the machine learning community. Several studies provide reviews of methods and datasets for MLC and a few provide empirical comparisons of MLC methods. However, they are limited in the number of methods and datasets considered. This work provides a comprehensive empirical study of a wide range of MLC methods on a plethora of datasets from various domains. More specifically, our study evaluates 26 methods on 42 benchmark datasets using 20 evaluation measures. The adopted evaluation methodology adheres to the highest literature standards for designing and executing large scale, time-budgeted experimental studies. First, the methods are selected based on their usage by the community, assuring representation of methods across the MLC taxonomy of methods and different base learners. Second, the datasets cover a wide range of complexity and domains of application. The selected evaluation measures assess the predictive performance and the efficiency of the methods. The results of the analysis identify RFPCT, RFDTBR, ECCJ48, EBRJ48 and AdaBoostMH as best performing methods across the spectrum of performance measures. Whenever a new method is introduced, it should be compared to different subsets of MLC methods, determined on the basis of the different evaluation criteria.
翻译:最近,机器学习界对多标签分类(MLC)的兴趣日益浓厚,一些研究对刚果解放运动的方法和数据集进行了审查,有些研究对刚果解放运动的方法和数据集进行了经验性比较,不过,这些研究对刚果解放运动的方法和数据集数量有限,但所考虑的方法和数据集数量有限,这项工作对来自不同领域的众多数据集的刚果解放运动方法进行了全面的经验性研究;更具体地说,我们的研究利用20项评价措施对42个基准数据集的26种方法进行了评价;采用的评价方法遵守了设计和实施大规模、有时间预算的实验性研究的最高文献标准;首先,根据社区使用的方法选择了方法,确保了刚果解放运动方法分类和不同基础学习者的方法的代表性;第二,数据集涵盖广泛的复杂程度和应用领域;选定的评价措施评估了方法的预测性业绩和效率;分析结果确定RFPCT、RFDDBR、ECCJ48、EBRJ48、EBRJ48和AdaBoostMH是整个业绩计量的最佳执行方法。每当采用新的方法时,就刚果解放运动的不同评估标准进行比较。