Lasso集团合并,用于利用高维绝对数据进行稀少的预测 (Group Lasso merger for sparse prediction with high-dimensional categorical data) - 专知论文

会员服务 ·

0

稀疏 · 分类数据 · GROUP · 估计/估计量 · 分解的 ·

2021 年 12 月 21 日

Group Lasso merger for sparse prediction with high-dimensional categorical data

翻译：Lasso集团合并,用于利用高维绝对数据进行稀少的预测

Szymon Nowakowski,Piotr Pokarowski,Wojciech Rejchel

Sparse prediction with categorical data is challenging even for a moderate number of variables, because one parameter is roughly needed to encode one category or level. The Group Lasso is a well known efficient algorithm for selection continuous or categorical variables, but all estimates related to a selected factor usually differ, so a fitted model may not be sparse. To make the Group Lasso solution sparse, we propose to merge levels of the selected factor, if a difference between its corresponding estimates is less than some predetermined threshold. We prove that under weak conditions our algorithm, called GLAMER for Group LAsso MERger, recovers the true, sparse linear or logistic model even for the high-dimensional scenario, that is when a number of parameters is greater than a learning sample size. To our knowledge, selection consistency has been proven many times for different algorithms fitting sparse models with categorical variables, but our result is the first for the high-dimensional scenario. Numerical experiments show the satisfactory performance of the GLAMER.

翻译：使用绝对数据进行粗略的预测,即使对于数量不多的变量也是具有挑战性的,因为对某一类别或级别进行编码,大致需要一个参数。Lasso集团是一个众所周知的用于选择连续或绝对变量的有效算法,但与选定因素有关的所有估计通常各不相同,因此,一个合适的模型可能不会稀释。为了使Lasso集团的解决方案变得稀少,我们提议合并选定因素的等级,如果相应的估计值之间的差别低于某些预先确定的阈值。我们证明,在薄弱的条件下,我们的算法,即GLAMER(G Group Lasso MERger的GLAAMER),恢复了真实的、稀少的线性或后勤性模型,即使是在高维情景下,也就是当一些参数大于学习抽样大小时。据我们所知,对于与绝对变量相匹配的稀少模型,选择一致性已被证明很多次,但我们的结果是高维假设的首个。数字实验显示了GLAMER的令人满意的性表现。

0

相关内容

【Manning新书】TensorFlow机器学习，454页pdf

【Manning新书】TensorFlow机器学习，454页pdf

专知会员服务

104+阅读 · 2021年11月14日

机器学习简介，61页pdf

专知会员服务

58+阅读 · 2021年7月6日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

【ECML-PKDD 2019】带歧义的分类变量编码（Encoding Categorical Variables with Ambiguity）

【ECML-PKDD 2019】带歧义的分类变量编码（Encoding Categorical Variables with Ambiguity）

专知会员服务

5+阅读 · 2019年12月1日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

【论文推荐】最新十篇度量学习相关论文—可量化表示、非线性度量学习、在线深度量学习、大间隔最近邻、判别深度度量、域自适应

【论文推荐】最新十篇度量学习相关论文—可量化表示、非线性度量学习、在线深度量学习、大间隔最近邻、判别深度度量、域自适应

专知

12+阅读 · 2018年5月18日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

On Learning Mixture Models with Sparse Parameters

Arxiv

0+阅读 · 2022年2月24日

Fourier Representations for Black-Box Optimization over Categorical Variables

Arxiv

0+阅读 · 2022年2月24日

Distributional Counterfactual Analysis in High-Dimensional Setup

Arxiv

0+阅读 · 2022年2月23日

Trimmed Harrell-Davis quantile estimator based on the highest density interval of the given width

Arxiv

0+阅读 · 2022年2月22日

Robust and Provable Guarantees for Sparse Random Embeddings

Arxiv

0+阅读 · 2022年2月22日

Efficient Data-specific Model Search for Collaborative Filtering

Efficient Data-specific Model Search for Collaborative Filtering

Arxiv

4+阅读 · 2021年6月14日

Type-augmented Relation Prediction in Knowledge Graphs

Type-augmented Relation Prediction in Knowledge Graphs

Arxiv

6+阅读 · 2020年9月16日

Knowledge Hypergraphs: Prediction Beyond Binary Relations

Knowledge Hypergraphs: Prediction Beyond Binary Relations

Arxiv

6+阅读 · 2020年7月15日

Test-time augmentation with uncertainty estimation for deep learning-based medical image segmentation

Test-time augmentation with uncertainty estimation for deep learning-based medical image segmentation

Arxiv

4+阅读 · 2018年7月19日

A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics

Arxiv

7+阅读 · 2018年6月12日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

【Manning新书】TensorFlow机器学习，454页pdf

【Manning新书】TensorFlow机器学习，454页pdf

专知会员服务

104+阅读 · 2021年11月14日

机器学习简介，61页pdf

专知会员服务

58+阅读 · 2021年7月6日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

【ECML-PKDD 2019】带歧义的分类变量编码（Encoding Categorical Variables with Ambiguity）

【ECML-PKDD 2019】带歧义的分类变量编码（Encoding Categorical Variables with Ambiguity）

专知会员服务

5+阅读 · 2019年12月1日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

【论文推荐】最新十篇度量学习相关论文—可量化表示、非线性度量学习、在线深度量学习、大间隔最近邻、判别深度度量、域自适应

【论文推荐】最新十篇度量学习相关论文—可量化表示、非线性度量学习、在线深度量学习、大间隔最近邻、判别深度度量、域自适应

专知

12+阅读 · 2018年5月18日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

On Learning Mixture Models with Sparse Parameters

Arxiv

0+阅读 · 2022年2月24日

Fourier Representations for Black-Box Optimization over Categorical Variables

Arxiv

0+阅读 · 2022年2月24日

Distributional Counterfactual Analysis in High-Dimensional Setup

Arxiv

0+阅读 · 2022年2月23日

Trimmed Harrell-Davis quantile estimator based on the highest density interval of the given width

Arxiv

0+阅读 · 2022年2月22日

Robust and Provable Guarantees for Sparse Random Embeddings

Arxiv

0+阅读 · 2022年2月22日

Efficient Data-specific Model Search for Collaborative Filtering

Efficient Data-specific Model Search for Collaborative Filtering

Arxiv

4+阅读 · 2021年6月14日

Type-augmented Relation Prediction in Knowledge Graphs

Type-augmented Relation Prediction in Knowledge Graphs

Arxiv

6+阅读 · 2020年9月16日

Knowledge Hypergraphs: Prediction Beyond Binary Relations

Knowledge Hypergraphs: Prediction Beyond Binary Relations

Arxiv

6+阅读 · 2020年7月15日

Test-time augmentation with uncertainty estimation for deep learning-based medical image segmentation

Test-time augmentation with uncertainty estimation for deep learning-based medical image segmentation

Arxiv

4+阅读 · 2018年7月19日

A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics

Arxiv

7+阅读 · 2018年6月12日

微信扫码咨询专知VIP会员