统计模型的生命周期:示范失败检测、识别和改造 (The Lifecycle of a Statistical Model: Model Failure Detection, Identification, and Refitting) - 专知论文

会员服务 ·

0

统计量 · MoDELS · 模型性能 · Performer · TOOLS ·

2022 年 2 月 8 日

The Lifecycle of a Statistical Model: Model Failure Detection, Identification, and Refitting

翻译：统计模型的生命周期:示范失败检测、识别和改造

Alnur Ali,Maxime Cauchois,John C. Duchi

The statistical machine learning community has demonstrated considerable resourcefulness over the years in developing highly expressive tools for estimation, prediction, and inference. The bedrock assumptions underlying these developments are that the data comes from a fixed population and displays little heterogeneity. But reality is significantly more complex: statistical models now routinely fail when released into real-world systems and scientific applications, where such assumptions rarely hold. Consequently, we pursue a different path in this paper vis-a-vis the well-worn trail of developing new methodology for estimation and prediction. In this paper, we develop tools and theory for detecting and identifying regions of the covariate space (subpopulations) where model performance has begun to degrade, and study intervening to fix these failures through refitting. We present empirical results with three real-world data sets -- including a time series involving forecasting the incidence of COVID-19 -- showing that our methodology generates interpretable results, is useful for tracking model performance, and can boost model performance through refitting. We complement these empirical results with theory proving that our methodology is minimax optimal for recovering anomalous subpopulations as well as refitting to improve accuracy in a structured normal means setting.

翻译：多年来,统计机学习界在开发高清晰度的估算、预测和推算工具方面表现出相当的智慧。这些发展的基本假设是数据来自固定人口,并显示出很少异质性。但现实则更为复杂:统计模型在向现实世界体系和科学应用发布时,通常会失败,而这种假设很少能维持。因此,我们在本文中寻求不同的道路,以发展新的估算和预测方法的老路。在本文中,我们开发了用于探测和确定模型性能开始退化的共变空间(子群)区域的工具和理论,并研究如何通过重新校正来弥补这些失败。我们用三个真实世界数据集介绍实证结果 -- -- 包括预测COVID-19发生率的时间序列 -- -- 表明,我们的方法产生可解释的结果,有助于跟踪模型性能,并通过重新校正来提高模型性能。我们用这些实证结果补充了这些理论,证明我们的方法在恢复形形形貌子群(亚群)方面最优于微速率,通过结构正常手段的精确度加以调整。

0

相关内容

统计量

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

功能磁共振成像无监督模式分析方法及应用

国家自然科学基金

3+阅读 · 2015年12月31日

复杂制造过程中轮廓数据监控方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

Vlasov-Poisson-Boltzmann方程研究

国家自然科学基金

0+阅读 · 2013年12月31日

Witten Laplacian的特征值及与其相关的Ricci Soliton研究

国家自然科学基金

0+阅读 · 2012年12月31日

心脏植入电子装置早期感染的诊断研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于多尺度随机场模型的高分辨率遥感影像分割方法研究

国家自然科学基金

1+阅读 · 2011年12月31日

极低水平放射性实验中的小概率本底事件评估方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

半监督互信息高光谱图像肿瘤检测与识别

国家自然科学基金

0+阅读 · 2009年12月31日

《物理》期刊

国家自然科学基金

1+阅读 · 2009年12月31日

神经网络子空间学习算法的收敛性与鲁棒性

国家自然科学基金

1+阅读 · 2009年12月31日

Robustness Testing of Data and Knowledge Driven Anomaly Detection in Cyber-Physical Systems

Arxiv

0+阅读 · 2022年4月20日

Constrained Sequence-to-Tree Generation for Hierarchical Text Classification

Arxiv

0+阅读 · 2022年4月19日

Detection and Mitigation of Algorithmic Bias via Predictive Rate Parity

Arxiv

0+阅读 · 2022年4月15日

Scalable and Real-time Multi-Camera Vehicle Detection, Re-Identification, and Tracking

Arxiv

0+阅读 · 2022年4月15日

Unsupervised Probabilistic Models for Sequential Electronic Health Records

Arxiv

0+阅读 · 2022年4月15日

This is the Moment for Probabilistic Loops

Arxiv

0+阅读 · 2022年4月14日

The Causal Learning of Retail Delinquency

Arxiv

15+阅读 · 2020年12月17日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

Interpretable machine learning: definitions, methods, and applications

Interpretable machine learning: definitions, methods, and applications

Arxiv

19+阅读 · 2019年1月14日

Generative Adversarial Networks and Probabilistic Graph Models for Hyperspectral Image Classification

Arxiv

11+阅读 · 2018年2月10日

VIP会员

文章信息

相关主题

相关VIP内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【新书】《知识图谱与大语言模型的协同应用》，544页pdf

军事通信系统：安全行动的支柱

《缓解大语言模型（LLMs）幻觉：面向应用的检索增强生成（RAG）、推理与智能体系统综述》

【新书】机器学习系统，2620页pdf

相关资讯

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Robustness Testing of Data and Knowledge Driven Anomaly Detection in Cyber-Physical Systems

Arxiv

0+阅读 · 2022年4月20日

Constrained Sequence-to-Tree Generation for Hierarchical Text Classification

Arxiv

0+阅读 · 2022年4月19日

Detection and Mitigation of Algorithmic Bias via Predictive Rate Parity

Arxiv

0+阅读 · 2022年4月15日

Scalable and Real-time Multi-Camera Vehicle Detection, Re-Identification, and Tracking

Arxiv

0+阅读 · 2022年4月15日

Unsupervised Probabilistic Models for Sequential Electronic Health Records

Arxiv

0+阅读 · 2022年4月15日

This is the Moment for Probabilistic Loops

Arxiv

0+阅读 · 2022年4月14日

The Causal Learning of Retail Delinquency

Arxiv

15+阅读 · 2020年12月17日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

Interpretable machine learning: definitions, methods, and applications

Interpretable machine learning: definitions, methods, and applications

Arxiv

19+阅读 · 2019年1月14日

Generative Adversarial Networks and Probabilistic Graph Models for Hyperspectral Image Classification

Arxiv

11+阅读 · 2018年2月10日

相关基金

功能磁共振成像无监督模式分析方法及应用

国家自然科学基金

3+阅读 · 2015年12月31日

复杂制造过程中轮廓数据监控方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

Vlasov-Poisson-Boltzmann方程研究

国家自然科学基金

0+阅读 · 2013年12月31日

Witten Laplacian的特征值及与其相关的Ricci Soliton研究

国家自然科学基金

0+阅读 · 2012年12月31日

心脏植入电子装置早期感染的诊断研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于多尺度随机场模型的高分辨率遥感影像分割方法研究

国家自然科学基金

1+阅读 · 2011年12月31日

极低水平放射性实验中的小概率本底事件评估方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

半监督互信息高光谱图像肿瘤检测与识别

国家自然科学基金

0+阅读 · 2009年12月31日

《物理》期刊

国家自然科学基金

1+阅读 · 2009年12月31日

神经网络子空间学习算法的收敛性与鲁棒性

国家自然科学基金

1+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员