用缺失值预测什么是好的估算? (What's a good imputation to predict with missing values?) - 专知论文

会员服务 ·

0

学成 · CRAFT · 泛函 · 预测器/决策函数 · Better ·

2021 年 6 月 1 日

What's a good imputation to predict with missing values?

翻译：用缺失值预测什么是好的估算?

Marine Le Morvan,Julie Josse,Erwan Scornet,Gaël Varoquaux

How to learn a good predictor on data with missing values? Most efforts focus on first imputing as well as possible and second learning on the completed data to predict the outcome. Yet, this widespread practice has no theoretical grounding. Here we show that for almost all imputation functions, an impute-then-regress procedure with a powerful learner is Bayes optimal. This result holds for all missing-values mechanisms, in contrast with the classic statistical results that require missing-at-random settings to use imputation in probabilistic modeling. Moreover, it implies that perfect conditional imputation may not be needed for good prediction asymptotically. In fact, we show that on perfectly imputed data the best regression function will generally be discontinuous, which makes it hard to learn. Crafting instead the imputation so as to leave the regression function unchanged simply shifts the problem to learning discontinuous imputations. Rather, we suggest that it is easier to learn imputation and regression jointly. We propose such a procedure, adapting NeuMiss, a neural network capturing the conditional links across observed and unobserved variables whatever the missing-value pattern. Experiments confirm that joint imputation and regression through NeuMiss is better than various two step procedures in our experiments with finite number of samples.

翻译：如何对缺少值的数据进行良好的预测? 大多数努力都侧重于首先估算以及可能和第二次对已完成的数据进行可能的预测,以预测结果。然而,这种普遍的做法没有理论依据。我们在这里显示,几乎所有估算功能, 与强健的学习者一起的估算后回归程序都是最佳的。这个结果对所有缺失值机制来说都是最佳的。这与传统的统计结果形成对照, 典型的统计结果要求错失随机假设在概率模型中使用估算。此外, 它意味着, 完全的有条件的估算可能并不需要, 才能很好地预测结果。事实上, 我们显示, 在完全估算的数据中, 最佳回归功能一般都是不连贯的, 这使得它很难学习。巧妙的估算, 以便让回归功能保持不变, 仅仅将问题转向学习不连续的估算。相反, 我们建议, 比较容易学习估算估算和回归的模型。我们建议采用这样的程序, 调整 Neimis, 一个神经网络, 来捕捉到观察到的和未观测过的两步态的路径。

0

相关内容

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

简明《神经网络数学》手册，16页pdf带你入门，Mathematics of Neural Networks

简明《神经网络数学》手册，16页pdf带你入门，Mathematics of Neural Networks

专知会员服务

68+阅读 · 2020年5月9日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

【MIT】图神经网络的泛化与表示极限，《Generalization and Representational Limits of Graph Neural Networks》

【MIT】图神经网络的泛化与表示极限，《Generalization and Representational Limits of Graph Neural Networks》

专知会员服务

46+阅读 · 2020年2月23日

【ECML-PKDD 2019】基于邻域增强LSTM模型的出租车乘客需求预测（A Neighborhood-augmented LSTM Model for Taxi-Passenger Demand Prediction）

【ECML-PKDD 2019】基于邻域增强LSTM模型的出租车乘客需求预测（A Neighborhood-augmented LSTM Model for Taxi-Passenger Demand Prediction）

专知会员服务

21+阅读 · 2019年12月1日

【经典图书】机器学习基础，427页pdf Foundations of machine learning

【经典图书】机器学习基础，427页pdf Foundations of machine learning

专知会员服务

158+阅读 · 2019年11月14日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】决策树/随机森林深入解析

【推荐】决策树/随机森林深入解析

机器学习研究会

5+阅读 · 2017年9月21日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Confidence intervals of prediction accuracy measures for multivariable prediction models based on the bootstrap-based optimism correction methods

Arxiv

0+阅读 · 2021年7月25日

Differentially Private (Gradient) Expectation Maximization Algorithm with Statistical Guarantees

Differentially Private (Gradient) Expectation Maximization Algorithm with Statistical Guarantees

Arxiv

0+阅读 · 2021年7月22日

Learning Bayesian Networks from Incomplete Data with the Node-Average Likelihood

Arxiv

0+阅读 · 2021年7月22日

Robust low-rank covariance matrix estimation with a general pattern of missing values

Arxiv

0+阅读 · 2021年7月22日

Investigating the performance of multi-objective optimization when learning Bayesian Networks

Arxiv

0+阅读 · 2021年7月20日

Strategies for variable selection in large-scale healthcare database studies with missing covariate and outcome data

Arxiv

0+阅读 · 2021年7月20日

Imitation by Predicting Observations

Imitation by Predicting Observations

Arxiv

4+阅读 · 2021年7月8日

Fictitious GAN: Training GANs with Historical Models

Arxiv

4+阅读 · 2018年3月23日

Link Prediction Based on Graph Neural Networks

Arxiv

26+阅读 · 2018年2月27日

Simplicial Closure and Higher-order Link Prediction

Arxiv

3+阅读 · 2018年2月20日

VIP会员

文章信息

相关主题

预测器/决策函数

相关VIP内容

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

简明《神经网络数学》手册，16页pdf带你入门，Mathematics of Neural Networks

简明《神经网络数学》手册，16页pdf带你入门，Mathematics of Neural Networks

专知会员服务

68+阅读 · 2020年5月9日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

【MIT】图神经网络的泛化与表示极限，《Generalization and Representational Limits of Graph Neural Networks》

【MIT】图神经网络的泛化与表示极限，《Generalization and Representational Limits of Graph Neural Networks》

专知会员服务

46+阅读 · 2020年2月23日

【ECML-PKDD 2019】基于邻域增强LSTM模型的出租车乘客需求预测（A Neighborhood-augmented LSTM Model for Taxi-Passenger Demand Prediction）

【ECML-PKDD 2019】基于邻域增强LSTM模型的出租车乘客需求预测（A Neighborhood-augmented LSTM Model for Taxi-Passenger Demand Prediction）

专知会员服务

21+阅读 · 2019年12月1日

【经典图书】机器学习基础，427页pdf Foundations of machine learning

【经典图书】机器学习基础，427页pdf Foundations of machine learning

专知会员服务

158+阅读 · 2019年11月14日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】决策树/随机森林深入解析

【推荐】决策树/随机森林深入解析

机器学习研究会

5+阅读 · 2017年9月21日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Confidence intervals of prediction accuracy measures for multivariable prediction models based on the bootstrap-based optimism correction methods

Arxiv

0+阅读 · 2021年7月25日

Differentially Private (Gradient) Expectation Maximization Algorithm with Statistical Guarantees

Differentially Private (Gradient) Expectation Maximization Algorithm with Statistical Guarantees

Arxiv

0+阅读 · 2021年7月22日

Learning Bayesian Networks from Incomplete Data with the Node-Average Likelihood

Arxiv

0+阅读 · 2021年7月22日

Robust low-rank covariance matrix estimation with a general pattern of missing values

Arxiv

0+阅读 · 2021年7月22日

Investigating the performance of multi-objective optimization when learning Bayesian Networks

Arxiv

0+阅读 · 2021年7月20日

Strategies for variable selection in large-scale healthcare database studies with missing covariate and outcome data

Arxiv

0+阅读 · 2021年7月20日

Imitation by Predicting Observations

Imitation by Predicting Observations

Arxiv

4+阅读 · 2021年7月8日

Fictitious GAN: Training GANs with Historical Models

Arxiv

4+阅读 · 2018年3月23日

Link Prediction Based on Graph Neural Networks

Arxiv

26+阅读 · 2018年2月27日

Simplicial Closure and Higher-order Link Prediction

Arxiv

3+阅读 · 2018年2月20日

微信扫码咨询专知VIP会员