The standard quantitative metric for evaluating enrichment capacity known as $\textit{LogAUC}$ depends on a cutoff parameter that controls what the minimum value of the log-scaled x-axis is. Unless this parameter is chosen carefully for a given ROC curve, one of the two following problems occurs: either (1) some fraction of the first inter-decoy intervals of the ROC curve are simply thrown away and do not contribute to the metric at all, or (2) the very first inter-decoy interval contributes too much to the metric at the expense of all following inter-decoy intervals. We fix this problem with LogAUC by showing a simple way to choose the cutoff parameter based on the number of decoys which forces the first inter-decoy interval to always have a stable, sensible contribution to the total value. Moreover, we introduce a normalized version of LogAUC known as $\textit{enrichment score}$, which (1) enforces stability by selecting the cutoff parameter in the manner described, (2) yields scores which are more intuitively meaningful, and (3) allows reliably accurate comparison of the enrichment capacities exhibited by different ROC curves, even those produced using different numbers of decoys. Finally, we demonstrate the advantage of enrichment score over unbalanced metrics using data from a real retrospective docking study performed using the program $\textit{DOCK 3.7}$ on the target receptor TRYB1 included in the $\textit{DUDE-Z}$ benchmark.
翻译:用于评估浓缩能力的标准量化指标 $\ textit{LogAUC}}$${LogAUC}} 的标准量化指标取决于控制日志缩放x轴最小值的截断参数。 除非为给定的 ROC 曲线仔细选择该参数, 否则以下两个问题中有一个出现:(1) ROC 曲线第一个隐隐喻间隔中的一部分被简单地扔掉, 完全无助于衡量标准, 或者(2) 第一次互换间隔对衡量标准的贡献太大, 从而牺牲所有以下的隐含间隔。 我们通过展示一个简单的方法来选择LogAUC 的截断参数。 除非为给给给给给定的 ROC 曲线选择一个最小值, 从而迫使第一个双十二间间隔对总值做出稳定、 明智的贡献。 此外,我们引入了一个标准化的LogAUAC 标准版本, 以所选取的截取的截断值参数, 以所有以下的隐含有意义的分数为基础, 能够可靠地比较由不同的 ROC\\\\\ 美元基准值显示的浓缩目标值的浓缩能力, 甚至使用不同的ROCSALSALSqlexal 的成绩的平分数, 数据, 正确比较了我们所完成的CRB 的平平平比 的平平平平比 数据 的平平平平平平平平平的平的平数 。