解决能力：一种比较无阈值评价指标区分能力的通用方法 (Resolving power: A general approach to compare the discriminating capacity of threshold-free evaluation metrics) - 专知论文

会员服务 ·

0

评价指标 · 阈值 · 评价 · 受试者工作特征 · 分类模型 ·

2023 年 3 月 31 日

Resolving power: A general approach to compare the discriminating capacity of threshold-free evaluation metrics

翻译：解决能力：一种比较无阈值评价指标区分能力的通用方法

from arxiv, 17 pages, 10 figures, 2 tables

This paper introduces the concept of resolving power to describe the capacity of an evaluation metric to discriminate between models of similar quality. This capacity depends on two attributes: 1. The metric's response to improvements in model quality (its signal), and 2. The metric's sampling variability (its noise). The paper defines resolving power as a metric's sampling uncertainty scaled by its signal. Resolving power's primary application is to compare the discriminating capacity of threshold-free evaluation metrics, such as the area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC). A simulation study compares the AUROC and the AUPRC in a variety of contexts. The analysis suggests that the AUROC generally has greater resolving power, but that the AUPRC is superior in some conditions, such as those where high-quality models are applied to low prevalence outcomes. The paper concludes by proposing an empirical method to estimate resolving power that can be applied to any dataset and any initial classification model.

翻译：本文引入了解决能力的概念来描述评价指标区分类似质量模型的能力。这个能力取决于两个属性：1. 指标对模型质量改进的反应（其信号），2. 指标的抽样可变性（其噪声）。本文将解决能力定义为指标的采样不确定性与其信号相乘之比。解决能力的主要应用是比较无阈值评价指标的区分能力，例如受试者工作特征曲线下面积（AUROC）和精确率-召回率曲线下面积（AUPRC）。一项模拟研究比较了AUROC和AUPRC在各种情况下的表现。分析表明，AUROC通常具有更大的解决能力，但在某些条件下，如将高质量模型应用于低发病率结果时，AUPRC更优。本文最后提出了一种经验方法来估计解决能力，该方法可应用于任何数据集和任何初始分类模型。

0

相关内容

评价指标

【深度迁移学习在图像分类中的应用综述】Deep transfer learning for image classification: a survey

【深度迁移学习在图像分类中的应用综述】Deep transfer learning for image classification: a survey

专知会员服务

25+阅读 · 2022年5月24日

【ACL2022】解释生成的多尺度分布深度变分自编码器, Multi-Scale Distribution Deep Variational Autoencoder for Explanation Generation

【ACL2022】解释生成的多尺度分布深度变分自编码器, Multi-Scale Distribution Deep Variational Autoencoder for Explanation Generation

专知会员服务

12+阅读 · 2022年3月24日

【牛津大学博士论文】流形的几何优化与深度学习的应用，154页pdf，Geometric Optimisation on Manifolds with Applications to Deep Learning

【牛津大学博士论文】流形的几何优化与深度学习的应用，154页pdf，Geometric Optimisation on Manifolds with Applications to Deep Learning

专知会员服务

22+阅读 · 2022年3月21日

【MIT】自监督几何感知，22页ppt，Self-supervised Geometric Perception

【MIT】自监督几何感知，22页ppt，Self-supervised Geometric Perception

专知会员服务

23+阅读 · 2021年6月3日

【ACL2020-Google】学习鲁棒度量的文本生成，BLEURT: Learning Robust Metrics for Text Generation

【ACL2020-Google】学习鲁棒度量的文本生成，BLEURT: Learning Robust Metrics for Text Generation

专知会员服务

17+阅读 · 2020年4月10日

【CVPR2020】强化特征点，Reinforced Feature Points: Optimizing Feature Detection and Description for a High-Level Task

【CVPR2020】强化特征点，Reinforced Feature Points: Optimizing Feature Detection and Description for a High-Level Task

专知会员服务

49+阅读 · 2020年2月25日

预训练语言模型究竟捕获了什么？（oLMpics - On what Language Model Pre-training Captures）

预训练语言模型究竟捕获了什么？（oLMpics - On what Language Model Pre-training Captures）

专知会员服务

14+阅读 · 2020年1月3日

【康奈尔大学】度量数据粒度，Measuring Dataset Granularity

【康奈尔大学】度量数据粒度，Measuring Dataset Granularity

专知会员服务

13+阅读 · 2019年12月27日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

【DSAA教程】可解释人工智能金融服务，325页ppt，Explainable AI in Financial Services

【DSAA教程】可解释人工智能金融服务，325页ppt，Explainable AI in Financial Services

专知

14+阅读 · 2021年10月21日

图像处理：从 bilateral filter 到 HDRnet

图像处理：从 bilateral filter 到 HDRnet

极市平台

30+阅读 · 2019年8月7日

图神经网络火了？谈下它的普适性与局限性

图神经网络火了？谈下它的普适性与局限性

机器之心

21+阅读 · 2019年7月29日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

tensorflow Object Detection API使用预训练模型mask r-cnn实现对象检测

tensorflow Object Detection API使用预训练模型mask r-cnn实现对象检测

极市平台

12+阅读 · 2018年8月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

光场成像的轴向超分辨率方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

电力设备tanδ在线监测中的信号去噪

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

基于能量流分析的混合型超级电容器物理模型及其结构与参数改进研究

国家自然科学基金

0+阅读 · 2013年12月31日

MIMO认知无线电系统的最优线性联合收发机设计的统一框架研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于激光合成波长干涉原理的波长测量方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

分数阶微分谱及其重叠峰信号的定性与定量分析理论

国家自然科学基金

0+阅读 · 2012年12月31日

谓词逻辑与模型检验中的计量化理论

国家自然科学基金

1+阅读 · 2011年12月31日

二阶逻辑的表达能力与计算复杂性

国家自然科学基金

0+阅读 · 2009年12月31日

EvEval: A Comprehensive Evaluation of Event Semantics for Large Language Models

Arxiv

0+阅读 · 2023年5月24日

Not All Metrics Are Guilty: Improving NLG Evaluation with LLM Paraphrasing

Arxiv

0+阅读 · 2023年5月24日

An Examination of the Robustness of Reference-Free Image Captioning Evaluation Metrics

Arxiv

0+阅读 · 2023年5月24日

The Limits to Learning a Diffusion Model

Arxiv

0+阅读 · 2023年5月23日

Complexity measure, kernel density estimation, bandwidth selection, and the efficient market hypothesis

Arxiv

0+阅读 · 2023年5月22日

HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models

Arxiv

0+阅读 · 2023年5月22日

Some power function distribution processes

Arxiv

0+阅读 · 2023年5月22日

Enrichment Score: a better quantitative metric for evaluating the enrichment capacity of molecular docking models

Arxiv

0+阅读 · 2023年5月22日

Evaluating the Impact of Social Determinants on Health Prediction

Arxiv

0+阅读 · 2023年5月22日

HELMA: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models

Arxiv

0+阅读 · 2023年5月19日

VIP会员

文章信息

相关主题

受试者工作特征

相关VIP内容

【深度迁移学习在图像分类中的应用综述】Deep transfer learning for image classification: a survey

【深度迁移学习在图像分类中的应用综述】Deep transfer learning for image classification: a survey

专知会员服务

25+阅读 · 2022年5月24日

【ACL2022】解释生成的多尺度分布深度变分自编码器, Multi-Scale Distribution Deep Variational Autoencoder for Explanation Generation

【ACL2022】解释生成的多尺度分布深度变分自编码器, Multi-Scale Distribution Deep Variational Autoencoder for Explanation Generation

专知会员服务

12+阅读 · 2022年3月24日

【牛津大学博士论文】流形的几何优化与深度学习的应用，154页pdf，Geometric Optimisation on Manifolds with Applications to Deep Learning

【牛津大学博士论文】流形的几何优化与深度学习的应用，154页pdf，Geometric Optimisation on Manifolds with Applications to Deep Learning

专知会员服务

22+阅读 · 2022年3月21日

【MIT】自监督几何感知，22页ppt，Self-supervised Geometric Perception

【MIT】自监督几何感知，22页ppt，Self-supervised Geometric Perception

专知会员服务

23+阅读 · 2021年6月3日

【ACL2020-Google】学习鲁棒度量的文本生成，BLEURT: Learning Robust Metrics for Text Generation

【ACL2020-Google】学习鲁棒度量的文本生成，BLEURT: Learning Robust Metrics for Text Generation

专知会员服务

17+阅读 · 2020年4月10日

【CVPR2020】强化特征点，Reinforced Feature Points: Optimizing Feature Detection and Description for a High-Level Task

【CVPR2020】强化特征点，Reinforced Feature Points: Optimizing Feature Detection and Description for a High-Level Task

专知会员服务

49+阅读 · 2020年2月25日

预训练语言模型究竟捕获了什么？（oLMpics - On what Language Model Pre-training Captures）

预训练语言模型究竟捕获了什么？（oLMpics - On what Language Model Pre-training Captures）

专知会员服务

14+阅读 · 2020年1月3日

【康奈尔大学】度量数据粒度，Measuring Dataset Granularity

【康奈尔大学】度量数据粒度，Measuring Dataset Granularity

专知会员服务

13+阅读 · 2019年12月27日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

【DSAA教程】可解释人工智能金融服务，325页ppt，Explainable AI in Financial Services

【DSAA教程】可解释人工智能金融服务，325页ppt，Explainable AI in Financial Services

专知

14+阅读 · 2021年10月21日

图像处理：从 bilateral filter 到 HDRnet

图像处理：从 bilateral filter 到 HDRnet

极市平台

30+阅读 · 2019年8月7日

图神经网络火了？谈下它的普适性与局限性

图神经网络火了？谈下它的普适性与局限性

机器之心

21+阅读 · 2019年7月29日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

tensorflow Object Detection API使用预训练模型mask r-cnn实现对象检测

tensorflow Object Detection API使用预训练模型mask r-cnn实现对象检测

极市平台

12+阅读 · 2018年8月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

EvEval: A Comprehensive Evaluation of Event Semantics for Large Language Models

Arxiv

0+阅读 · 2023年5月24日

Not All Metrics Are Guilty: Improving NLG Evaluation with LLM Paraphrasing

Arxiv

0+阅读 · 2023年5月24日

An Examination of the Robustness of Reference-Free Image Captioning Evaluation Metrics

Arxiv

0+阅读 · 2023年5月24日

The Limits to Learning a Diffusion Model

Arxiv

0+阅读 · 2023年5月23日

Complexity measure, kernel density estimation, bandwidth selection, and the efficient market hypothesis

Arxiv

0+阅读 · 2023年5月22日

HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models

Arxiv

0+阅读 · 2023年5月22日

Some power function distribution processes

Arxiv

0+阅读 · 2023年5月22日

Enrichment Score: a better quantitative metric for evaluating the enrichment capacity of molecular docking models

Arxiv

0+阅读 · 2023年5月22日

Evaluating the Impact of Social Determinants on Health Prediction

Arxiv

0+阅读 · 2023年5月22日

HELMA: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models

Arxiv

0+阅读 · 2023年5月19日

相关基金

光场成像的轴向超分辨率方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

电力设备tanδ在线监测中的信号去噪

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

基于能量流分析的混合型超级电容器物理模型及其结构与参数改进研究

国家自然科学基金

0+阅读 · 2013年12月31日

MIMO认知无线电系统的最优线性联合收发机设计的统一框架研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于激光合成波长干涉原理的波长测量方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

分数阶微分谱及其重叠峰信号的定性与定量分析理论

国家自然科学基金

0+阅读 · 2012年12月31日

谓词逻辑与模型检验中的计量化理论

国家自然科学基金

1+阅读 · 2011年12月31日

二阶逻辑的表达能力与计算复杂性

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员