机器学习研究的推理可重复性之路 (Towards Inferential Reproducibility of Machine Learning Research) - 专知论文

会员服务 ·

0

方差 · 噪声 · 显著性 · 相互作用 · 分析 ·

2023 年 4 月 13 日

Towards Inferential Reproducibility of Machine Learning Research

翻译：机器学习研究的推理可重复性之路

Michael Hagmann,Philipp Meier,Stefan Riezler

from arxiv, Published at ICLR 2023 (see https://openreview.net/pdf?id=li4GQCQWkv)

Reliability of machine learning evaluation -- the consistency of observed evaluation scores across replicated model training runs -- is affected by several sources of nondeterminism which can be regarded as measurement noise. Current tendencies to remove noise in order to enforce reproducibility of research results neglect inherent nondeterminism at the implementation level and disregard crucial interaction effects between algorithmic noise factors and data properties. This limits the scope of conclusions that can be drawn from such experiments. Instead of removing noise, we propose to incorporate several sources of variance, including their interaction with data properties, into an analysis of significance and reliability of machine learning evaluation, with the aim to draw inferences beyond particular instances of trained models. We show how to use linear mixed effects models (LMEMs) to analyze performance evaluation scores, and to conduct statistical inference with a generalized likelihood ratio test (GLRT). This allows us to incorporate arbitrary sources of noise like meta-parameter variations into statistical significance testing, and to assess performance differences conditional on data properties. Furthermore, a variance component analysis (VCA) enables the analysis of the contribution of noise sources to overall variance and the computation of a reliability coefficient by the ratio of substantial to total variance.

翻译：机器学习评估的可靠性，即观察到的评估分数在复制的模型训练运行之间的一致性，受到几个非确定性来源的影响，这些可以被视为测量噪声。目前倾向于消除噪音，以强制研究成果的可重复性，忽略了实现级别的固有非确定性，并忽略了算法噪音因素与数据特性之间的关键相互作用效应。这限制了从这些实验中可以得出的结论范围。我们建议不要消除噪声，而是将多个方差的来源，包括它们与数据特性的相互作用，纳入对机器学习评估的显著性和可靠性的分析中，以便从特定的训练模型实例中提取推论。我们展示了如何使用线性混合效应模型（LMEM）来分析性能评估分数，并用广义似然比检验（GLRT）进行统计推断。这使得我们能够将任意噪声源（如元参数变化）纳入统计显著性测试，并在数据特性条件下评估性能差异。此外，方差分量分析（VCA）可用于分析噪声来源对总方差的贡献，并通过实质方差与总方差之比计算可靠性系数。

1

相关内容

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

专知会员服务

72+阅读 · 2022年7月11日

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

126+阅读 · 2022年4月21日

【Google AI】鲁棒图神经网络，Robust Graph Neural Networks

【Google AI】鲁棒图神经网络，Robust Graph Neural Networks

专知会员服务

38+阅读 · 2022年3月9日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

回顾机器学习公平的数学框架，Review of Mathematical frameworks for Fairness in Machine Learning

回顾机器学习公平的数学框架，Review of Mathematical frameworks for Fairness in Machine Learning

专知会员服务

38+阅读 · 2020年5月30日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

专知

3+阅读 · 2022年7月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

笔记 | Sentiment Analysis

笔记 | Sentiment Analysis

黑龙江大学自然语言处理实验室

10+阅读 · 2018年5月6日

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

机器学习研究会

13+阅读 · 2017年12月25日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

高维数据下多因变量回归模型的统计推断

国家自然科学基金

5+阅读 · 2013年12月31日

大气压空气中表面解吸常压化学电离机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

脑微血管病变和低灌注在帕金森病轻度认知障碍中的作用及机制

国家自然科学基金

0+阅读 · 2012年12月31日

高维数据的图模型学习与统计推断

国家自然科学基金

8+阅读 · 2012年12月31日

高维数据的非参数经验贝叶斯方法

国家自然科学基金

1+阅读 · 2012年12月31日

量子散射中的异常现象、Levinson 定理及其它

国家自然科学基金

0+阅读 · 2011年12月31日

代谢组学中的生物启发式高维数据特征选择方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

Angiomotin在肾透明细胞癌中异常表达的分子机制及潜在的靶向治疗价值

国家自然科学基金

0+阅读 · 2011年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

干涉SAR与LIDAR森林参数协同反演模型与方法

国家自然科学基金

0+阅读 · 2008年12月31日

Deep Operator Learning Lessens the Curse of Dimensionality for PDEs

Arxiv

0+阅读 · 2023年5月30日

Trade-off Between Efficiency and Consistency for Removal-based Explanations

Arxiv

0+阅读 · 2023年5月30日

Improving the Generalizability of Trajectory Prediction Models with Frenet-Based Domain Normalization

Arxiv

0+阅读 · 2023年5月29日

Robust inference of causality in high-dimensional dynamical processes from the Information Imbalance of distance ranks

Arxiv

0+阅读 · 2023年5月27日

On random number generators and practical market efficiency

Arxiv

0+阅读 · 2023年5月27日

Online multiple hypothesis testing for reproducible research

Arxiv

0+阅读 · 2023年5月26日

A Survey of Meta-Reinforcement Learning

Arxiv

12+阅读 · 2023年1月19日

A Survey of Human-in-the-loop for Machine Learning

Arxiv

37+阅读 · 2021年8月2日

A Survey on the Explainability of Supervised Machine Learning

Arxiv

24+阅读 · 2020年11月16日

A Survey on Causal Inference

Arxiv

112+阅读 · 2020年2月5日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

专知会员服务

72+阅读 · 2022年7月11日

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

126+阅读 · 2022年4月21日

【Google AI】鲁棒图神经网络，Robust Graph Neural Networks

【Google AI】鲁棒图神经网络，Robust Graph Neural Networks

专知会员服务

38+阅读 · 2022年3月9日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

回顾机器学习公平的数学框架，Review of Mathematical frameworks for Fairness in Machine Learning

回顾机器学习公平的数学框架，Review of Mathematical frameworks for Fairness in Machine Learning

专知会员服务

38+阅读 · 2020年5月30日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【牛津大学博士论文】将序列结构与几何结构融入深度神经网络

工程视角：影响战争进程的小型无人机

企业级AI应用开发：从技术选型到生产落地

AI生成代码缺陷综述

相关资讯

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

专知

3+阅读 · 2022年7月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

笔记 | Sentiment Analysis

笔记 | Sentiment Analysis

黑龙江大学自然语言处理实验室

10+阅读 · 2018年5月6日

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

机器学习研究会

13+阅读 · 2017年12月25日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

相关论文

Deep Operator Learning Lessens the Curse of Dimensionality for PDEs

Arxiv

0+阅读 · 2023年5月30日

Trade-off Between Efficiency and Consistency for Removal-based Explanations

Arxiv

0+阅读 · 2023年5月30日

Improving the Generalizability of Trajectory Prediction Models with Frenet-Based Domain Normalization

Arxiv

0+阅读 · 2023年5月29日

Robust inference of causality in high-dimensional dynamical processes from the Information Imbalance of distance ranks

Arxiv

0+阅读 · 2023年5月27日

On random number generators and practical market efficiency

Arxiv

0+阅读 · 2023年5月27日

Online multiple hypothesis testing for reproducible research

Arxiv

0+阅读 · 2023年5月26日

A Survey of Meta-Reinforcement Learning

Arxiv

12+阅读 · 2023年1月19日

A Survey of Human-in-the-loop for Machine Learning

Arxiv

37+阅读 · 2021年8月2日

A Survey on the Explainability of Supervised Machine Learning

Arxiv

24+阅读 · 2020年11月16日

A Survey on Causal Inference

Arxiv

112+阅读 · 2020年2月5日

相关基金

高维数据下多因变量回归模型的统计推断

国家自然科学基金

5+阅读 · 2013年12月31日

大气压空气中表面解吸常压化学电离机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

脑微血管病变和低灌注在帕金森病轻度认知障碍中的作用及机制

国家自然科学基金

0+阅读 · 2012年12月31日

高维数据的图模型学习与统计推断

国家自然科学基金

8+阅读 · 2012年12月31日

高维数据的非参数经验贝叶斯方法

国家自然科学基金

1+阅读 · 2012年12月31日

量子散射中的异常现象、Levinson 定理及其它

国家自然科学基金

0+阅读 · 2011年12月31日

代谢组学中的生物启发式高维数据特征选择方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

Angiomotin在肾透明细胞癌中异常表达的分子机制及潜在的靶向治疗价值

国家自然科学基金

0+阅读 · 2011年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

干涉SAR与LIDAR森林参数协同反演模型与方法

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员