Area under ROC curve (AUC) is a widely used performance measure for classification models. We propose two new distributionally robust AUC maximization models (DR-AUC) that rely on the Kantorovich metric and approximate the AUC with the hinge loss function. We consider the two cases with respectively fixed and variable support for the worst-case distribution. We use duality theory to reformulate the DR-AUC models and derive tractable convex optimization problems. The numerical experiments show that the proposed DR-AUC models -- benchmarked with the standard deterministic AUC and the support vector machine models - perform better in general and in particular improve the worst-case out-of-sample performance over the majority of the considered datasets, thereby showing their robustness. The results are particularly encouraging since our numerical experiments are conducted with training sets of small size which have been known to be conducive to low out-of-sample performance.
翻译:ROC曲线(AUC)下的区域是用于分类模型的一种广泛使用的业绩计量。我们建议了两种新的分布稳健的AUC最大化模型(DR-AUC),这些模型依赖Kantorovich衡量标准,并用断链损失函数来接近AUC。我们考虑两种案例,分别对最坏的分布分别提供固定和可变的支持。我们使用双重理论重新配置DR-AUC模型,并找出可移植的convex优化问题。数字实验表明,拟议的DR-AUC模型 -- -- 以标准确定性AUC和辅助矢量机模型为基准 -- -- 总体上表现更好,特别是改进了对大多数考虑的数据集最坏的外样性能,从而显示了其稳健性。结果特别令人鼓舞,因为我们的数值实验是以小的训练组进行,已知这些训练有助于低抽样性能。