Areas under ROC (AUROC) and precision-recall curves (AUPRC) are common metrics for evaluating classification performance for imbalanced problems. Compared with AUROC, AUPRC is a more appropriate metric for highly imbalanced datasets. While stochastic optimization of AUROC has been studied extensively, principled stochastic optimization of AUPRC has been rarely explored. In this work, we propose a principled technical method to optimize AUPRC for deep learning. Our approach is based on maximizing the averaged precision (AP), which is an unbiased point estimator of AUPRC. We cast the objective into a sum of {\it dependent compositional functions} with inner functions dependent on random variables of the outer level. We propose efficient adaptive and non-adaptive stochastic algorithms named SOAP with {\it provable convergence guarantee under mild conditions} by leveraging recent advances in stochastic compositional optimization. Extensive experimental results on image and graph datasets demonstrate that our proposed method outperforms prior methods on imbalanced problems in terms of AUPRC. To the best of our knowledge, our work represents the first attempt to optimize AUPRC with provable convergence. The SOAP has been implemented in the libAUC library at~\url{https://libauc.org/}.
翻译:领域不平衡问题的分类表现评估通常使用 ROC (AUROC) 和 Precision-Recall曲线下方的区域 (AUPRC) 这两个指标。与 AUROC 相比,AUPRC 更适用于高度不平衡的数据集。虽然 AUROC 的随机优化已经得到了广泛的研究,但是 AUPRC 的随机优化很少被探讨。在这项工作中,我们提出了一种基于 Deep Learning 的 AUPRC 随机优化技术方法。我们的方法基于最大化平均准确率 (AP),它是 AUPRC 的无偏点估计器。我们将目标转化为一个由依赖组成的函数之和,内部函数依赖于外部级别的随机变量。我们通过利用最近在随机组合优化方面的进展,提出了高效的自适应和非自适应的随机算法,名为 SOAP,并具有在温和条件下的可靠收敛保证。对图像和图表数据集进行的广泛实验结果表明,我们提出的方法在 AUPRC 方面的不平衡问题上优于先前的方法。据我们所知,我们的工作代表了第一次尝试通过证明收敛来优化 AUPRC。SOAP 已经在 libAUC 库中实现,网址为 https://libauc.org/。