Multilabel learning is an important topic in machine learning research. Evaluating models in multilabel settings requires specific cross validation methods designed for multilabel data. In this article, we show a weakness in an evaluation metric widely used in literature and we present improved versions of this metric and a general method, optisplit, for optimising cross validations splits. We present an extensive comparison of various types of cross validation methods in which we show that optisplit produces better cross validation splits than the existing methods and that it is fast enough to be used on big Gene Ontology (GO) datasets
翻译:多标签学习是机器学习研究的一个重要专题。多标签环境中的模型评估要求为多标签数据设计具体的交叉验证方法。在本条中,我们在文献中广泛使用的评价指标中显示出一个弱点,我们提出了这一指标的改进版本和通用方法(opoptisplit),以优化交叉验证的分割。我们广泛比较了各种类型的交叉验证方法,其中我们表明,与现有方法相比,optisplit产生更好的交叉验证方法,而且快速到可用于大基因本体(GO)数据集