In this paper, we show the arc length of the optimal ROC curve is an $f$-divergence. By leveraging this result, we express the arc length using a variational objective and estimate it accurately using positive and negative samples. We show this estimator has a non-parametric convergence rate $O_p(n^{-\beta/4})$ ($\beta \in (0,1]$ depends on the smoothness). Using the same technique, we show the surface area between the optimal ROC curve and the diagonal can be expressed via a similar variational objective. These new insights lead to a novel classification procedure that maximizes an approximate lower bound of the maximal AUC. Experiments on CIFAR-10 datasets show the proposed two-step procedure achieves good AUC performance in imbalanced binary classification tasks.
翻译:在本文中, 我们显示最佳 ROC 曲线的弧长度为 $f$- divegence 。 通过利用这一结果, 我们使用一个变量目标来表达弧长度, 并精确地使用正和负样本来估计它。 我们显示此估计值有非参数趋同率 $O_ p (n ⁇ -\beta/4}) $ ($\beta \ in ( 0, 1美元) 取决于平滑性 。 使用同样的技术, 我们显示最佳 ROC 曲线和对角曲线之间的表面区域可以通过类似的变异性目标来表达 。 这些新的洞见导致一种新的分类程序, 使最大ACU 的大约较低界限最大化。 CIFAR- 10 数据集的实验显示, 拟议的两步程序在不平衡的二进制分类任务中实现了良好 ACUC 表现 。