Receiver Operating Characteristic (ROC) curves are plots of true positive rate versus false positive rate which are used to evaluate binary classification algorithms. Because the Area Under the Curve (AUC) is a constant function of the predicted values, learning algorithms instead optimize convex relaxations which involve a sum over all pairs of labeled positive and negative examples. Naive learning algorithms compute the gradient in quadratic time, which is too slow for learning using large batch sizes. We propose a new functional representation of the square loss and squared hinge loss, which results in algorithms that compute the gradient in either linear or log-linear time, and makes it possible to use gradient descent learning with large batch sizes. In our empirical study of supervised binary classification problems, we show that our new algorithm can achieve higher test AUC values on imbalanced data sets than previous algorithms, and make use of larger batch sizes than were previously feasible.
翻译:由于曲线下的区域(AUC)是预测值的常数函数,学习算法而不是优化曲线松动,它涉及所有标签的正数和负数组合。在四方位时间计算梯度时,使用大批量尺寸学习速度太慢。我们建议对平方损失和正方形断层损失进行新的功能表示,从而产生计算线性或线性梯度的算法,从而有可能使用大批量梯度的梯度脱序学习。在我们对监督的二进制分类问题的实验研究中,我们表明,我们的新算法可以对不平衡的数据集进行比以前算法更高的测试AUC值,并使用比以前可行的更大批量的批量。