Although being a question in the very methodological core of machine learning, there is still no unanimous consensus on how to compare classifiers. Every comparison framework is confronted with (at least) three fundamental challenges: the multiplicity of quality criteria, the multiplicity of data sets and the randomness/arbitrariness of the selection of data sets. In this paper, we add a fresh view to the vivid debate by adopting recent developments in decision theory. Our resulting framework, based on so-called preference systems, ranks classifiers by a generalized concept of stochastic dominance, which powerfully circumvents the cumbersome, and often even self-contradictory, reliance on aggregates. Moreover, we show that generalized stochastic dominance can be operationalized by solving easy-to-handle linear programs and statistically tested by means of an adapted two-sample observation-randomization test. This indeed yields a powerful framework for the statistical comparison of classifiers with respect to multiple quality criteria simultaneously. We illustrate and investigate our framework in a simulation study and with standard benchmark data sets.
翻译:尽管这是机器学习方法核心中的一个问题,但在如何比较分类者方面仍然没有一致意见。每个比较框架都面临(至少是)三个基本挑战:质量标准的多重性、数据集的多重性和数据集选择的随机性/任意性。在本文件中,我们通过采纳决策理论的最新发展,为生动的辩论增添了新的视角。我们根据所谓的偏好制度,将分类者排在一般的随机支配地位概念的排位上,这种概念大大绕过了繁琐的、甚至往往是自相矛盾的,依赖总量。此外,我们表明,通过解决容易操作的线性方案,并通过经调整的双抽样观察-随机化测试,可以实现普遍的随机支配地位。这确实为分类者的统计比较与多重质量标准同时提供了强有力的框架。我们通过模拟研究和标准基准数据集来说明和调查我们的框架。