多血性山脊回归快速交叉校验 (Fast cross-validation for multi-penalty ridge regression) - 专知论文

会员服务 ·

0

岭回归 · Performer · FAST · Extensibility · 边缘似然函数 ·

2021 年 4 月 1 日

Fast cross-validation for multi-penalty ridge regression

翻译：多血性山脊回归快速交叉校验

Mark A. van de Wiel,Mirrelijn M. van Nee,Armin Rauschenberger

High-dimensional prediction with multiple data types needs to account for potentially strong differences in predictive signal. Ridge regression is a simple model for high-dimensional data that has challenged the predictive performance of many more complex models and learners, and that allows inclusion of data type specific penalties. The largest challenge for multi-penalty ridge is to optimize these penalties efficiently in a cross-validation (CV) setting, in particular for GLM and Cox ridge regression, which require an additional estimation loop by iterative weighted least squares (IWLS). Our main contribution is a computationally very efficient formula for the multi-penalty, sample-weighted hat-matrix, as used in the IWLS algorithm. As a result, nearly all computations are in low-dimensional space, rendering a speed-up of several orders of magnitude. We developed a flexible framework that facilitates multiple types of response, unpenalized covariates, several performance criteria and repeated CV. Extensions to paired and preferential data types are included and illustrated on several cancer genomics survival prediction problems. Moreover, we present similar computational shortcuts for maximum marginal likelihood and Bayesian probit regression. The corresponding R-package, multiridge, serves as a versatile standalone tool, but also as a fast benchmark for other more complex models and multi-view learners.

翻译：具有多种数据类型的高尺度预测需要考虑到预测信号中潜在的巨大差异。山脊回归是高维数据的简单模型,它挑战了许多更复杂的模型和学习者的预测性能,并允许纳入数据类型的特定惩罚。多锥脊的最大挑战是在交叉校验(CV)环境中高效优化这些处罚,特别是GLM和Cox脊回归,这需要用迭接加权最小平方(IWLS)进行额外的估计循环。我们的主要贡献是计算出一个高效的多锥体、抽样加权的帽子矩阵公式,这在IWLS算法中使用。因此,几乎所有的计算方法都位于低维空间,使几个数量级的加速。我们开发了一个灵活的框架,促进多种类型的反应、无依赖的共变异性、若干性标准以及重复的CV。配对和特准数据类型的扩展,并演示了几个癌症基因组生存预测问题。此外,我们提出了类似的计算捷径捷捷捷捷捷的捷径,作为最边缘、多基的模型,也作为其他标准。

0

相关内容

岭回归

【TPAMI2021】鲁棒可微SVD，Robust Differentiable SVD

专知会员服务

23+阅读 · 2021年4月10日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【伯克利】最新《深度半监督学习》总述，146页ppt，Semi-Supervised Learning

【伯克利】最新《深度半监督学习》总述，146页ppt，Semi-Supervised Learning

专知会员服务

148+阅读 · 2020年4月11日

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

专知会员服务

37+阅读 · 2020年3月27日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

专知会员服务

11+阅读 · 2019年12月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

sklearn 与分类算法

sklearn 与分类算法

人工智能头条

7+阅读 · 2019年3月12日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

数据分析师应该知道的16种回归技术：弹性网络回归

数据分析师应该知道的16种回归技术：弹性网络回归

数萃大数据

91+阅读 · 2018年8月16日

数据分析师应该知道的16种回归技术：Lasso回归

数据分析师应该知道的16种回归技术：Lasso回归

数萃大数据

16+阅读 · 2018年8月13日

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

机器学习研究会

13+阅读 · 2017年12月25日

【学习】(Python)SVM数据分类

【学习】(Python)SVM数据分类

机器学习研究会

6+阅读 · 2017年10月15日

【推荐】决策树/随机森林深入解析

【推荐】决策树/随机森林深入解析

机器学习研究会

5+阅读 · 2017年9月21日

【论文】【论文】王晓刚老师课题组ICCV2017论文：学习特征金字塔用于人体姿态估计（附代码）

【论文】【论文】王晓刚老师课题组ICCV2017论文：学习特征金字塔用于人体姿态估计（附代码）

机器学习研究会

6+阅读 · 2017年8月5日

Logistic回归第一弹——二项Logistic Regression

Logistic回归第一弹——二项Logistic Regression

机器学习深度学习实战原创交流

3+阅读 · 2015年10月22日

OpReg-Boost: Learning to Accelerate Online Algorithms with Operator Regression

Arxiv

0+阅读 · 2021年5月27日

Flexible Bayesian modelling of concomitant covariate effects in mixture models

Arxiv

0+阅读 · 2021年5月26日

Bayes Factor Asymptotics for Variable Selection in the Gaussian Process Framework

Bayes Factor Asymptotics for Variable Selection in the Gaussian Process Framework

Arxiv

0+阅读 · 2021年5月26日

Limitations of Autoregressive Models and Their Alternatives

Limitations of Autoregressive Models and Their Alternatives

Arxiv

0+阅读 · 2021年5月26日

An algorithm-based multiple detection influence measure for high dimensional regression using expectile

Arxiv

0+阅读 · 2021年5月26日

SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression

Arxiv

0+阅读 · 2021年5月25日

Testing Cross-Validation Variants in Ranking Environments

Arxiv

0+阅读 · 2021年5月25日

Variational Auto-Regressive Gaussian Processes for Continual Learning

Arxiv

0+阅读 · 2021年5月25日

SHAFF: Fast and consistent SHApley eFfect estimates via random Forests

Arxiv

0+阅读 · 2021年5月25日

Residual Policy Learning

Residual Policy Learning

Arxiv

4+阅读 · 2018年12月15日

VIP会员

文章信息

相关主题

边缘似然函数

相关VIP内容

【TPAMI2021】鲁棒可微SVD，Robust Differentiable SVD

专知会员服务

23+阅读 · 2021年4月10日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【伯克利】最新《深度半监督学习》总述，146页ppt，Semi-Supervised Learning

【伯克利】最新《深度半监督学习》总述，146页ppt，Semi-Supervised Learning

专知会员服务

148+阅读 · 2020年4月11日

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

专知会员服务

37+阅读 · 2020年3月27日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

专知会员服务

11+阅读 · 2019年12月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

大语言模型智能体强化学习：全景综述

《城市滨海地区：理解复杂多变环境下的指挥控制框架》50页报告

【伯克利博士论文】从推理服务到训练：面向大规模 LLM 智能体的高效系统

美空军“顶点2025”实验：推进AI在C2、动态目标锁定与联盟集成中的应用

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

sklearn 与分类算法

sklearn 与分类算法

人工智能头条

7+阅读 · 2019年3月12日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

数据分析师应该知道的16种回归技术：弹性网络回归

数据分析师应该知道的16种回归技术：弹性网络回归

数萃大数据

91+阅读 · 2018年8月16日

数据分析师应该知道的16种回归技术：Lasso回归

数据分析师应该知道的16种回归技术：Lasso回归

数萃大数据

16+阅读 · 2018年8月13日

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

机器学习研究会

13+阅读 · 2017年12月25日

【学习】(Python)SVM数据分类

【学习】(Python)SVM数据分类

机器学习研究会

6+阅读 · 2017年10月15日

【推荐】决策树/随机森林深入解析

【推荐】决策树/随机森林深入解析

机器学习研究会

5+阅读 · 2017年9月21日

【论文】【论文】王晓刚老师课题组ICCV2017论文：学习特征金字塔用于人体姿态估计（附代码）

【论文】【论文】王晓刚老师课题组ICCV2017论文：学习特征金字塔用于人体姿态估计（附代码）

机器学习研究会

6+阅读 · 2017年8月5日

Logistic回归第一弹——二项Logistic Regression

Logistic回归第一弹——二项Logistic Regression

机器学习深度学习实战原创交流

3+阅读 · 2015年10月22日

相关论文

OpReg-Boost: Learning to Accelerate Online Algorithms with Operator Regression

Arxiv

0+阅读 · 2021年5月27日

Flexible Bayesian modelling of concomitant covariate effects in mixture models

Arxiv

0+阅读 · 2021年5月26日

Bayes Factor Asymptotics for Variable Selection in the Gaussian Process Framework

Bayes Factor Asymptotics for Variable Selection in the Gaussian Process Framework

Arxiv

0+阅读 · 2021年5月26日

Limitations of Autoregressive Models and Their Alternatives

Limitations of Autoregressive Models and Their Alternatives

Arxiv

0+阅读 · 2021年5月26日

An algorithm-based multiple detection influence measure for high dimensional regression using expectile

Arxiv

0+阅读 · 2021年5月26日

SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression

Arxiv

0+阅读 · 2021年5月25日

Testing Cross-Validation Variants in Ranking Environments

Arxiv

0+阅读 · 2021年5月25日

Variational Auto-Regressive Gaussian Processes for Continual Learning

Arxiv

0+阅读 · 2021年5月25日

SHAFF: Fast and consistent SHApley eFfect estimates via random Forests

Arxiv

0+阅读 · 2021年5月25日

Residual Policy Learning

Residual Policy Learning

Arxiv

4+阅读 · 2018年12月15日

微信扫码咨询专知VIP会员