Areas under ROC (AUROC) and precision-recall curves (AUPRC) are common metrics for evaluating classification performance for imbalanced problems. Compared with AUROC, AUPRC is a more appropriate metric for highly imbalanced datasets. While direct optimization of AUROC has been studied extensively, optimization of AUPRC has been rarely explored. In this work, we propose a principled technical method to optimize AUPRC for deep learning. Our approach is based on maximizing the averaged precision (AP), which is an unbiased point estimator of AUPRC. We show that the surrogate loss function for AP is highly non-convex and more complicated than that of AUROC. We cast the objective into a sum of dependent compositional functions with inner functions dependent on random variables of the outer level. We propose efficient adaptive and non-adaptive stochastic algorithms with provable convergence guarantee under mild conditions by using recent advances in stochastic compositional optimization. Extensive experimental results on graphs and image datasets demonstrate that our proposed method outperforms prior methods on imbalanced problems. To the best of our knowledge, our work represents the first attempt to optimize AUPRC with provable convergence.
翻译:在ROC(AUROC)和精确召回曲线(AUPRC)下的地区是评价不平衡问题分类性能的通用指标。与AUROC相比,AUPRC是衡量高度不平衡的数据集的更适当指标。虽然直接优化AUROC已经进行了广泛研究,但很少探索AURC的优化。在这项工作中,我们提出了一个原则性的技术方法,以优化AUPRC进行深层次学习。我们的方法的基础是最大限度地提高平均精确度(AP),这是AUPRC的公正点估计仪。我们表明,AP的代孕损失功能高度非对立,比AUROC的功能更为复杂。我们把目标化为依赖内部功能的构成功能的总和,而内部功能取决于外层的随机变量。我们提出高效的适应性和非适应性随机性随机性蒸馏算法,在温和的条件下保证汇合。我们利用最近在蒸汽组成优化方面的进展,在图表和图像数据集上的广泛实验结果表明,我们提出的方法比AUROC的最初方法要优于关于不平衡问题的最佳方法。