超越隐含时后回归:调整预测以适应缺失数据 (Beyond Impute-Then-Regress: Adapting Prediction to Missing Data) - 专知论文

会员服务 ·

0

线性回归 · 线性的 · MoDELS · 情景 · 推断 ·

2022 年 10 月 5 日

Beyond Impute-Then-Regress: Adapting Prediction to Missing Data

翻译：超越隐含时后回归:调整预测以适应缺失数据

Dimitris Bertsimas,Arthur Delarue,Jean Pauphilet

Missing values are a common issue in real-world datasets. The gold standard for dealing with missing data in inference is to assume that the data is missing at random and apply an impute-then-estimate procedure. In this paper, we evaluate the relevance of the assumptions and methods developed in inference for prediction tasks. We first} provide a theoretical analysis of impute-then-regress methods and highlight their successes and failures in making accurate predictions. We propose adaptive linear regression, a new class of models that adapt to the set of available features and can be applied on partially observed data directly. We show that adaptive linear regression can be equivalent to impute-then-regress methods where the imputation and the linear regression models are learned simultaneously instead of sequentially. We leverage this joint-impute-then-regress interpretation to generalize our framework to non-linear models. We validate our theoretical findings and adaptive regression approaches with extensive numerical results on synthetic, semi-synthetic, and real-world datasets. Among others, in settings where data is strongly not missing at random, our methods achieve a 6\% improvement in out-of-sample accuracy.

翻译：缺少的值是真实世界数据集中常见的问题。处理缺失的数据的黄金标准推论是假设数据随机缺失,并采用直线估算程序。在本文中,我们评估为预测任务而开发的假设和方法的相关性。我们首先从理论角度分析估算后回归法,并突出其在准确预测方面的成败和失败。我们提出了适应性线回归法, 这是一种适应性直线回归法, 适应现有特征的新型模型, 可以直接应用于部分观测的数据。我们表明, 适应性线回归法可以等同于同时学习估算和线性回归模型的模拟后回归法, 而不是按顺序学习。我们利用这种联合假设后回归法解释将我们的框架推广到非线性模型。我们验证我们的理论发现和适应性回归法, 其合成、半合成和真实世界数据集上的广泛数字结果。除其他外, 在数据明显没有随机缺失的情况下, 我们的方法实现了外部精确度的6 ⁇ 。

0

相关内容

线性回归

线性回归是利用数理统计中回归分析，来确定两种或两种以上变量间相互依赖的定量关系的一种统计分析方法，运用十分广泛。其表达形式为y = w'x+e，e为误差服从均值为0的正态分布。

知识荟萃

精品入门和进阶教程、论文和代码整理等

更多

查看相关VIP内容、论文、资讯等

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【干货书】开放数据结构，Open Data Structures，337页pdf

【干货书】开放数据结构，Open Data Structures，337页pdf

专知会员服务

18+阅读 · 2021年9月17日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

罗巴代数的表示和罗巴代数在operad中的应用

国家自然科学基金

0+阅读 · 2015年12月31日

SMYD3调控Wnt/β-catenin信号通路的分子机制及其在肝细胞癌中功能的研究

国家自然科学基金

0+阅读 · 2015年12月31日

半参数回归模型中随机误差分布的检验问题

国家自然科学基金

2+阅读 · 2015年12月31日

家蚕BmNPV出芽型病毒囊膜蛋白GP64受体蛋白的分子鉴定

国家自然科学基金

0+阅读 · 2014年12月31日

变分方法与非线性偏微分方程中若干问题的研究

国家自然科学基金

0+阅读 · 2013年12月31日

Partial Spread Bent函数与Bent-Negabent函数的构造及密码学性质研究

国家自然科学基金

0+阅读 · 2013年12月31日

渐近锥流形上色散方程的研究

国家自然科学基金

0+阅读 · 2013年12月31日

雷帕霉素复合物1在巨噬细胞炎症反应中的作用与机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

高带外抑制特性S波段低损耗宽带高温超导滤波器研究

国家自然科学基金

0+阅读 · 2012年12月31日

非线性方程中的拓扑与变分方法

国家自然科学基金

1+阅读 · 2011年12月31日

RIGID: Robust Linear Regression with Missing Data

Arxiv

0+阅读 · 2022年11月9日

Cold Start Streaming Learning for Deep Networks

Arxiv

0+阅读 · 2022年11月9日

Flexible variable selection in the presence of missing data

Arxiv

0+阅读 · 2022年11月8日

Small sample methods for cluster-robust variance estimation and hypothesis testing in fixed effects models

Arxiv

0+阅读 · 2022年11月6日

A new ranking scheme for modern data and its application to two-sample hypothesis testing

Arxiv

0+阅读 · 2022年11月3日

Domain Adaptation under Missingness Shift

Arxiv

0+阅读 · 2022年11月3日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Adaptive Consistency Regularization for Semi-Supervised Transfer Learning

Arxiv

23+阅读 · 2021年3月3日

AdarGCN: Adaptive Aggregation GCN for Few-Shot Learning

AdarGCN: Adaptive Aggregation GCN for Few-Shot Learning

Arxiv

64+阅读 · 2020年2月28日

Learning to Propagate Labels: Transductive Propagation Network for Few-shot Learning

Arxiv

21+阅读 · 2018年12月25日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【干货书】开放数据结构，Open Data Structures，337页pdf

【干货书】开放数据结构，Open Data Structures，337页pdf

专知会员服务

18+阅读 · 2021年9月17日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】多目标奖励与偏好优化：理论与算法

《无形的防御者？将定向能武器集成到反无人机框架的机遇与挑战》报告

自主化海军：海上无人系统与未来海战

迈向智能体系统规模化的科学

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

RIGID: Robust Linear Regression with Missing Data

Arxiv

0+阅读 · 2022年11月9日

Cold Start Streaming Learning for Deep Networks

Arxiv

0+阅读 · 2022年11月9日

Flexible variable selection in the presence of missing data

Arxiv

0+阅读 · 2022年11月8日

Small sample methods for cluster-robust variance estimation and hypothesis testing in fixed effects models

Arxiv

0+阅读 · 2022年11月6日

A new ranking scheme for modern data and its application to two-sample hypothesis testing

Arxiv

0+阅读 · 2022年11月3日

Domain Adaptation under Missingness Shift

Arxiv

0+阅读 · 2022年11月3日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Adaptive Consistency Regularization for Semi-Supervised Transfer Learning

Arxiv

23+阅读 · 2021年3月3日

AdarGCN: Adaptive Aggregation GCN for Few-Shot Learning

AdarGCN: Adaptive Aggregation GCN for Few-Shot Learning

Arxiv

64+阅读 · 2020年2月28日

Learning to Propagate Labels: Transductive Propagation Network for Few-shot Learning

Arxiv

21+阅读 · 2018年12月25日

相关基金

罗巴代数的表示和罗巴代数在operad中的应用

国家自然科学基金

0+阅读 · 2015年12月31日

SMYD3调控Wnt/β-catenin信号通路的分子机制及其在肝细胞癌中功能的研究

国家自然科学基金

0+阅读 · 2015年12月31日

半参数回归模型中随机误差分布的检验问题

国家自然科学基金

2+阅读 · 2015年12月31日

家蚕BmNPV出芽型病毒囊膜蛋白GP64受体蛋白的分子鉴定

国家自然科学基金

0+阅读 · 2014年12月31日

变分方法与非线性偏微分方程中若干问题的研究

国家自然科学基金

0+阅读 · 2013年12月31日

Partial Spread Bent函数与Bent-Negabent函数的构造及密码学性质研究

国家自然科学基金

0+阅读 · 2013年12月31日

渐近锥流形上色散方程的研究

国家自然科学基金

0+阅读 · 2013年12月31日

雷帕霉素复合物1在巨噬细胞炎症反应中的作用与机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

高带外抑制特性S波段低损耗宽带高温超导滤波器研究

国家自然科学基金

0+阅读 · 2012年12月31日

非线性方程中的拓扑与变分方法

国家自然科学基金

1+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员