甲型估算法隐含地使高维线性模型规范化 (Naive imputation implicitly regularizes high-dimensional linear models) - 专知论文

会员服务 ·

0

线性的 · 正则化项 · 线性模型 · 有偏 · 预测器/决策函数 ·

2023 年 1 月 31 日

Naive imputation implicitly regularizes high-dimensional linear models

翻译：甲型估算法隐含地使高维线性模型规范化

Alexis Ayme,Claire Boyer,Aymeric Dieuleveut,Erwan Scornet

Two different approaches exist to handle missing values for prediction: either imputation, prior to fitting any predictive algorithms, or dedicated methods able to natively incorporate missing values. While imputation is widely (and easily) use, it is unfortunately biased when low-capacity predictors (such as linear models) are applied afterward. However, in practice, naive imputation exhibits good predictive performance. In this paper, we study the impact of imputation in a high-dimensional linear model with MCAR missing data. We prove that zero imputation performs an implicit regularization closely related to the ridge method, often used in high-dimensional problems. Leveraging on this connection, we establish that the imputation bias is controlled by a ridge bias, which vanishes in high dimension. As a predictor, we argue in favor of the averaged SGD strategy, applied to zero-imputed data. We establish an upper bound on its generalization error, highlighting that imputation is benign in the d $\sqrt$ n regime. Experiments illustrate our findings.

翻译：处理缺失的预测值有两种不同的方法:估算,在安装任何预测算法之前,或者专门的方法能够本地吸收缺失的值。尽管估算是广泛(和容易)使用的,但不幸的是,当低容量预测器(如线性模型)在事后应用时,这种估算有偏差。然而,在实践上,天真估算显示良好的预测性能。在本文中,我们研究了估算在高维线性模型中的影响,而MCAR丢失了数据。我们证明,零估算与脊椎法密切关联,经常用于高维度问题。在这种关联上,我们确定估算偏差受高维度消失的山脊偏差控制。作为预测者,我们主张赞成平均SGD战略,适用于零指数数据。我们对其一般化错误设置了上层界限,强调在 d\sqrt$ n 系统中的估算值是无害的。实验展示了我们的调查结果。

0

相关内容

线性的

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

microRNA-34a调控ACSL1在胆道闭锁肝脏脂质代谢异常中的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

炭质泥岩路堤动力湿化变形及损伤失稳机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

活血化瘀抗动脉粥样硬化与血栓形成的剪应力效应机理

国家自然科学基金

0+阅读 · 2014年12月31日

梁板结构的分布动载荷识别技术

国家自然科学基金

0+阅读 · 2013年12月31日

一种考虑微尺度金属材料损伤的应变梯度理论

国家自然科学基金

0+阅读 · 2013年12月31日

藏药"郎庆阿塔"对原发性胆汁性肝硬化的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

钛种植体表面纳米锌抑菌改性的实验研究

国家自然科学基金

0+阅读 · 2011年12月31日

广义Fermat猜想与相关的丢番图方程

国家自然科学基金

1+阅读 · 2009年12月31日

组织的隐性知识传播模型研究

国家自然科学基金

0+阅读 · 2008年12月31日

航空发动机疲劳寿命预测及故障诊断研究

国家自然科学基金

5+阅读 · 2008年12月31日

Retire: Robust Expectile Regression in High Dimensions

Arxiv

0+阅读 · 2023年3月22日

Exponential Consistency of M-estimators in Generalized Linear Mixed Models

Arxiv

0+阅读 · 2023年3月22日

Quasi Maximum Likelihood Estimation of High-Dimensional Factor Models

Arxiv

0+阅读 · 2023年3月21日

Long-tailed Classification from a Bayesian-decision-theory Perspective

Arxiv

0+阅读 · 2023年3月21日

Principal Component Analysis based frameworks for efficient missing data imputation algorithms

Arxiv

0+阅读 · 2023年3月19日

Improving Uncertainty Quantification of Deep Classifiers via Neighborhood Conformal Prediction: Novel Algorithm and Theoretical Analysis

Arxiv

0+阅读 · 2023年3月19日

Doxastic Extensions of Łukasiewicz Logic

Arxiv

0+阅读 · 2023年3月18日

Zero-shot Transferable and Persistently Feasible Safe Control for High Dimensional Systems by Consistent Abstraction

Arxiv

0+阅读 · 2023年3月17日

SFE: A Simple, Fast and Efficient Feature Selection Algorithm for High-Dimensional Data

Arxiv

0+阅读 · 2023年3月17日

High-Dimensional Approximate Nearest Neighbor Search: with Reliable and Efficient Distance Comparison Operations

Arxiv

0+阅读 · 2023年3月17日

VIP会员

文章信息

相关主题

预测器/决策函数

相关VIP内容

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

Retire: Robust Expectile Regression in High Dimensions

Arxiv

0+阅读 · 2023年3月22日

Exponential Consistency of M-estimators in Generalized Linear Mixed Models

Arxiv

0+阅读 · 2023年3月22日

Quasi Maximum Likelihood Estimation of High-Dimensional Factor Models

Arxiv

0+阅读 · 2023年3月21日

Long-tailed Classification from a Bayesian-decision-theory Perspective

Arxiv

0+阅读 · 2023年3月21日

Principal Component Analysis based frameworks for efficient missing data imputation algorithms

Arxiv

0+阅读 · 2023年3月19日

Improving Uncertainty Quantification of Deep Classifiers via Neighborhood Conformal Prediction: Novel Algorithm and Theoretical Analysis

Arxiv

0+阅读 · 2023年3月19日

Doxastic Extensions of Łukasiewicz Logic

Arxiv

0+阅读 · 2023年3月18日

Zero-shot Transferable and Persistently Feasible Safe Control for High Dimensional Systems by Consistent Abstraction

Arxiv

0+阅读 · 2023年3月17日

SFE: A Simple, Fast and Efficient Feature Selection Algorithm for High-Dimensional Data

Arxiv

0+阅读 · 2023年3月17日

High-Dimensional Approximate Nearest Neighbor Search: with Reliable and Efficient Distance Comparison Operations

Arxiv

0+阅读 · 2023年3月17日

相关基金

microRNA-34a调控ACSL1在胆道闭锁肝脏脂质代谢异常中的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

炭质泥岩路堤动力湿化变形及损伤失稳机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

活血化瘀抗动脉粥样硬化与血栓形成的剪应力效应机理

国家自然科学基金

0+阅读 · 2014年12月31日

梁板结构的分布动载荷识别技术

国家自然科学基金

0+阅读 · 2013年12月31日

一种考虑微尺度金属材料损伤的应变梯度理论

国家自然科学基金

0+阅读 · 2013年12月31日

藏药"郎庆阿塔"对原发性胆汁性肝硬化的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

钛种植体表面纳米锌抑菌改性的实验研究

国家自然科学基金

0+阅读 · 2011年12月31日

广义Fermat猜想与相关的丢番图方程

国家自然科学基金

1+阅读 · 2009年12月31日

组织的隐性知识传播模型研究

国家自然科学基金

0+阅读 · 2008年12月31日

航空发动机疲劳寿命预测及故障诊断研究

国家自然科学基金

5+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员