大数据设置中相协调概率的计算高效接近率 (Computational Efficient Approximations of the Concordance Probability in a Big Data Setting)

Performance measurement is an essential task once a statistical model is created. The Area Under the receiving operating characteristics Curve (AUC) is the most popular measure for evaluating the quality of a binary classifier. In this case, AUC is equal to the concordance probability, a frequently used measure to evaluate the discriminatory power of the model. Contrary to AUC, the concordance probability can also be extended to the situation with a continuous response variable. Due to the staggering size of data sets nowadays, determining this discriminatory measure requires a tremendous amount of costly computations and is hence immensely time consuming, certainly in case of a continuous response variable. Therefore, we propose two estimation methods that calculate the concordance probability in a fast and accurate way and that can be applied to both the discrete and continuous setting. Extensive simulation studies show the excellent performance and fast computing times of both estimators. Finally, experiments on two real-life data sets confirm the conclusions of the artificial simulations.

翻译：一旦创建了统计模型,绩效衡量就是一项基本任务。在接收的操作特性曲线(AUC)下的区域是评价二进制分类器质量的最受欢迎的衡量标准。在这种情况下,AUC等于一致性概率,这是用来评价模型歧视性力量的常用衡量标准。与AUC相反,一致性概率也可以以连续反应变量扩大到情况。由于当今数据集的惊人大小,确定这一歧视性计量需要大量昂贵的计算,因此耗时极大,当然是在连续反应变量的情况下。因此,我们建议两种估算方法,以快速和准确的方式计算一致性概率,并且可以适用于离散和连续的设置。广泛的模拟研究显示了两个估算器的出色性能和快速计算时间。最后,两个实时数据集的实验证实了人工模拟的结论。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

54+阅读 · 2021年1月20日

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

经济学中的数据科学，Data Science in Economics，附22页pdf

专知会员服务

36+阅读 · 2020年4月1日

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日