Measuring the performance of a classifier is a vital task in machine learning. The running time of an algorithm that computes the measure plays a very small role in an offline setting, for example, when the classifier is being developed by a researcher. However, the running time becomes more crucial if our goal is to monitor the performance of a classifier over time. In this paper we study three algorithms for maintaining two measures. The first algorithm maintains area under the ROC curve (AUC) under addition and deletion of data points in $O(\log n)$ time. This is done by maintaining the data points sorted in a self-balanced search tree. In addition, we augment the search tree that allows us to query the ROC coordinates of a data point in $O(\log n)$ time. In doing so we are able to maintain AUC in $O(\log n)$ time. Our next two algorithms involve in maintaining $H$-measure, an alternative measure based on the ROC curve. Computing the measure is a two-step process: first we need to compute a convex hull of the ROC curve, followed by a sum over the convex hull. We demonstrate that we can maintain the convex hull using a minor modification of the classic convex hull maintenance algorithm. We then show that under certain conditions, we can compute the $H$-measure exactly in $O(\log^2 n)$ time, and if the conditions are not met, then we can estimate the $H$-measure in $O((\log n + \epsilon^{-1})\log n)$ time. We show empirically that our methods are significantly faster than the baselines.
翻译:测量分类器的性能是机器学习的关键任务。 计算测量测量量的算法运行时间在离线设置中作用很小, 例如当分类器由研究人员开发时。 但是, 如果我们的目标是在一段时间内监测分类器的性能, 运行时间就变得更为关键 。 在本文中, 我们研究三种算法以维持两个计量。 第一个算法在 ROC 曲线下保留区域, 以 $( log n) 计算数据点 。 这是通过在 自我平衡的搜索树中维持数据点排序。 此外, 我们增加搜索树, 使我们能够在 $(\ log n) 的时间里查询一个数据点的 ROC 坐标 。 这样我们就可以在 $( log n) 的时间里保持 AUC 。 我们下两个算法在 $( $( $) ) 的测量量, 一种基于 ROC 曲线上的替代度是 $( $( $) 美元) 。 计算该算算算出一个两步过程: 首先我们需要在 的 comx comx 里, 如果在 ROCx ral 的 里, 我们可以显示某种 的 的 。 我们的 方向上显示某种 。