线性回归常规要素 (Core-Elements for Classical Linear Regression) - 专知论文

会员服务 ·

0

估计/估计量 · 子采样 · 预测器/决策函数 · 线性回归 · 线性的 ·

2023 年 3 月 17 日

Core-Elements for Classical Linear Regression

翻译：线性回归常规要素

Mengyu Li,Jun Yu,Tao Li,Cheng Meng

The coresets approach, also called subsampling or subset selection, aims to select a subsample as a surrogate for the observed sample. Such an approach has been used pervasively in large-scale data analysis. Existing coresets methods construct the subsample using a subset of rows from the predictor matrix. Such methods can be significantly inefficient when the predictor matrix is sparse or numerically sparse. To overcome the limitation, we develop a novel element-wise subset selection approach, called core-elements, for large-scale least squares estimation in classical linear regression. We provide a deterministic algorithm to construct the core-elements estimator, only requiring an $O(\mbox{nnz}(\mathbf{X})+rp^2)$ computational cost, where $\mathbf{X}$ is an $n\times p$ predictor matrix, $r$ is the number of elements selected from each column of $\mathbf{X}$, and $\mbox{nnz}(\cdot)$ denotes the number of non-zero elements. Theoretically, we show that the proposed estimator is unbiased and approximately minimizes an upper bound of the estimation variance. We also provide an approximation guarantee by deriving a coresets-like finite sample bound for the proposed estimator. To handle potential outliers in the data, we further combine core-elements with the median-of-means procedure, resulting in an efficient and robust estimator with theoretical consistency guarantees. Numerical studies on various synthetic and open-source datasets demonstrate the proposed method's superior performance compared to mainstream competitors.

翻译：核心集方法，也称为子采样或子集选择，旨在选择一个子样本作为观察样本的代理。这种方法已广泛用于大规模数据分析。现有的核心集方法使用来自预测矩阵中的行集的子集构建子样本。当预测矩阵稀疏或数值稀疏时，这种方法可能会显着低效。为了克服这个限制，我们开发了一种用于传统线性回归中大规模最小二乘估计的新型基于元素的子集选择方法，称为核心元素。我们提供一种确定性算法来构造核心元素估计器，仅需要一个$O(\mbox{nnz}(\mathbf{X})+rp^2)$的计算成本，其中$\mathbf{X}$是一个$n\times p$预测矩阵，$r$是从每个$\mathbf{X}$列中选择的元素数量，$\mbox{nnz}(\cdot)$表示非零元素的数量。从理论上讲，我们证明了所提出的估计器是无偏的，并且近似最小化了估计方差的上界。通过为所提出的估算器推导类似核心集的有限样本界限，我们还提供了一个近似保证。为了处理数据中可能存在的异常值，我们进一步将核心元素与均值的中位数过程相结合，得到一个具有理论一致性保证的高效和稳健的估计器。在各种合成和开源数据集上进行的数值研究证明了所提出方法与主流竞争对手相比具有更好的性能。

0

相关内容

估计/估计量

估计/估计量

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

专知会员服务

63+阅读 · 2023年2月15日

【ICML2021】核持续学习，Kernel Continual Learning

专知会员服务

32+阅读 · 2021年7月15日

《算法凸几何》简明书，Algorithmic Convex Geometry，50页pdf

专知会员服务

42+阅读 · 2021年4月2日

【CVPR2021】自监督几何感知

【CVPR2021】自监督几何感知

专知会员服务

46+阅读 · 2021年3月6日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【经典书】现代统计方法基础，267页pdf，Fundamentals of Modern Statistical Methods

【经典书】现代统计方法基础，267页pdf，Fundamentals of Modern Statistical Methods

专知会员服务

64+阅读 · 2020年8月10日

【伯克利-Ke Li】学习优化，74页ppt，Learning to Optimize

【伯克利-Ke Li】学习优化，74页ppt，Learning to Optimize

专知会员服务

41+阅读 · 2020年7月23日

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

专知会员服务

54+阅读 · 2020年3月5日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

笔记 | Deep active learning for named entity recognition

笔记 | Deep active learning for named entity recognition

黑龙江大学自然语言处理实验室

24+阅读 · 2018年5月27日

【CNN】一文读懂卷积神经网络CNN

【CNN】一文读懂卷积神经网络CNN

产业智能官

18+阅读 · 2018年1月2日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

机器学习(23)之GBDT详解

机器学习(23)之GBDT详解

机器学习算法与Python学习

12+阅读 · 2017年10月25日

MNIST入门：贝叶斯方法

MNIST入门：贝叶斯方法

Python程序员

23+阅读 · 2017年7月3日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

基于Universum学习的降维方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

多元线性整值时间序列的统计分析

国家自然科学基金

2+阅读 · 2013年12月31日

非参数与半参数混合模型的统计推断及应用

国家自然科学基金

3+阅读 · 2012年12月31日

基于空间优化的连续型多设施选址方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

整数值时间序列数据的建模方法研究

国家自然科学基金

1+阅读 · 2012年12月31日

正相协及缺失数据情形的经验似然推断

国家自然科学基金

0+阅读 · 2012年12月31日

外源添加物质对百子莲胚性愈伤组织超低温保存逆境应答的调控机理

国家自然科学基金

0+阅读 · 2011年12月31日

多尺度自适应方法的研究和应用

国家自然科学基金

0+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

On near-redundancy and identifiability of parametric hazard regression models under censoring

Arxiv

0+阅读 · 2023年5月9日

Adaptive Localized Reduced Basis Methods for Large Scale Parameterized Systems

Arxiv

0+阅读 · 2023年5月9日

Toward Auto-evaluation with Confidence-based Category Relation-aware Regression

Arxiv

0+阅读 · 2023年5月9日

Sparse Sliced Inverse Regression via Random Projection

Arxiv

0+阅读 · 2023年5月9日

A faster algorithm for counting the integer points number in $Δ$-modular polyhedra (corrected version)

Arxiv

0+阅读 · 2023年5月8日

Sliced Inverse Regression with Large Structural Dimensions

Arxiv

0+阅读 · 2023年5月7日

A minimax optimal approach to high-dimensional double sparse linear regression

Arxiv

0+阅读 · 2023年5月7日

A technical note on bilinear layers for interpretability

Arxiv

0+阅读 · 2023年5月5日

Carbon Price Forecasting with Quantile Regression and Feature Selection

Arxiv

0+阅读 · 2023年5月5日

An Assessment of the Supremizer and Aggregation Methods of Stabilization for Reduced-Order Models

Arxiv

0+阅读 · 2023年5月4日

VIP会员

文章信息

相关主题

估计/估计量

预测器/决策函数

相关VIP内容

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

专知会员服务

63+阅读 · 2023年2月15日

【ICML2021】核持续学习，Kernel Continual Learning

专知会员服务

32+阅读 · 2021年7月15日

《算法凸几何》简明书，Algorithmic Convex Geometry，50页pdf

专知会员服务

42+阅读 · 2021年4月2日

【CVPR2021】自监督几何感知

【CVPR2021】自监督几何感知

专知会员服务

46+阅读 · 2021年3月6日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【经典书】现代统计方法基础，267页pdf，Fundamentals of Modern Statistical Methods

【经典书】现代统计方法基础，267页pdf，Fundamentals of Modern Statistical Methods

专知会员服务

64+阅读 · 2020年8月10日

【伯克利-Ke Li】学习优化，74页ppt，Learning to Optimize

【伯克利-Ke Li】学习优化，74页ppt，Learning to Optimize

专知会员服务

41+阅读 · 2020年7月23日

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

专知会员服务

54+阅读 · 2020年3月5日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能治理的未来

模态感知的特征匹配：单一模态与跨模态技术的全面综述

无监督行人重识别研究综述

【牛津博士论文】面向神经影像应用的可扩展且可解释的空间模型

相关资讯

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

笔记 | Deep active learning for named entity recognition

笔记 | Deep active learning for named entity recognition

黑龙江大学自然语言处理实验室

24+阅读 · 2018年5月27日

【CNN】一文读懂卷积神经网络CNN

【CNN】一文读懂卷积神经网络CNN

产业智能官

18+阅读 · 2018年1月2日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

机器学习(23)之GBDT详解

机器学习(23)之GBDT详解

机器学习算法与Python学习

12+阅读 · 2017年10月25日

MNIST入门：贝叶斯方法

MNIST入门：贝叶斯方法

Python程序员

23+阅读 · 2017年7月3日

相关论文

On near-redundancy and identifiability of parametric hazard regression models under censoring

Arxiv

0+阅读 · 2023年5月9日

Adaptive Localized Reduced Basis Methods for Large Scale Parameterized Systems

Arxiv

0+阅读 · 2023年5月9日

Toward Auto-evaluation with Confidence-based Category Relation-aware Regression

Arxiv

0+阅读 · 2023年5月9日

Sparse Sliced Inverse Regression via Random Projection

Arxiv

0+阅读 · 2023年5月9日

A faster algorithm for counting the integer points number in $Δ$-modular polyhedra (corrected version)

Arxiv

0+阅读 · 2023年5月8日

Sliced Inverse Regression with Large Structural Dimensions

Arxiv

0+阅读 · 2023年5月7日

A minimax optimal approach to high-dimensional double sparse linear regression

Arxiv

0+阅读 · 2023年5月7日

A technical note on bilinear layers for interpretability

Arxiv

0+阅读 · 2023年5月5日

Carbon Price Forecasting with Quantile Regression and Feature Selection

Arxiv

0+阅读 · 2023年5月5日

An Assessment of the Supremizer and Aggregation Methods of Stabilization for Reduced-Order Models

Arxiv

0+阅读 · 2023年5月4日

相关基金

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

基于Universum学习的降维方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

多元线性整值时间序列的统计分析

国家自然科学基金

2+阅读 · 2013年12月31日

非参数与半参数混合模型的统计推断及应用

国家自然科学基金

3+阅读 · 2012年12月31日

基于空间优化的连续型多设施选址方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

整数值时间序列数据的建模方法研究

国家自然科学基金

1+阅读 · 2012年12月31日

正相协及缺失数据情形的经验似然推断

国家自然科学基金

0+阅读 · 2012年12月31日

外源添加物质对百子莲胚性愈伤组织超低温保存逆境应答的调控机理

国家自然科学基金

0+阅读 · 2011年12月31日

多尺度自适应方法的研究和应用

国家自然科学基金

0+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员