使用非混凝解混合处罚模拟高多层次分类数据 (Modelling High-Dimensional Categorical Data Using Nonconvex Fusion Penalties) - 专知论文

会员服务 ·

0

分类数据 · 非凸 · 坐标下降 · CASE · 估计/估计量 ·

2021 年 5 月 13 日

Modelling High-Dimensional Categorical Data Using Nonconvex Fusion Penalties

翻译：使用非混凝解混合处罚模拟高多层次分类数据

Benjamin G. Stokell,Rajen D. Shah,Ryan J. Tibshirani

from arxiv, 52 pages, 10 figures; to appear in JRSSB

We propose a method for estimation in high-dimensional linear models with nominal categorical data. Our estimator, called SCOPE, fuses levels together by making their corresponding coefficients exactly equal. This is achieved using the minimax concave penalty on differences between the order statistics of the coefficients for a categorical variable, thereby clustering the coefficients. We provide an algorithm for exact and efficient computation of the global minimum of the resulting nonconvex objective in the case with a single variable with potentially many levels, and use this within a block coordinate descent procedure in the multivariate case. We show that an oracle least squares solution that exploits the unknown level fusions is a limit point of the coordinate descent with high probability, provided the true levels have a certain minimum separation; these conditions are known to be minimal in the univariate case. We demonstrate the favourable performance of SCOPE across a range of real and simulated datasets. An R package CatReg implementing SCOPE for linear models and also a version for logistic regression is available on CRAN.

翻译：我们建议了高维线性模型的估算方法,并附有名义绝对数据。我们的测算器称为SCOPE, 引信水平, 使相应的系数完全相等。这是使用对绝对变量系数的顺序统计差异的迷你式混合罚款来实现的, 从而将系数组合在一起。我们提供了一个算法, 精确和高效地计算由此得出的全球最低非电离目标, 并使用一个单一变量, 可能具有多层次, 并在多变量案例中使用这个块协调下降程序。我们显示, 利用未知水平聚合的最小极小的方块, 极有可能是协调下降的极限点, 只要真实水平有一定的最低分数; 这些条件在单词中是已知的最低值。我们展示了SAPE在一系列真实和模拟数据集中的有利性表现。一个名为 CatReg 的软件包, 在线性模型中应用SAPE, 并在 CRAN 上有一个逻辑回归的版本。

0

相关内容

分类数据

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【斯坦福大学博士论文】大规模和高维统计学习方法和算法，147页pdf， Large-scale and high-dimensional statistical learning methods and algorithms

专知会员服务

26+阅读 · 2020年6月13日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【ECML-PKDD 2019】带歧义的分类变量编码（Encoding Categorical Variables with Ambiguity）

【ECML-PKDD 2019】带歧义的分类变量编码（Encoding Categorical Variables with Ambiguity）

专知会员服务

5+阅读 · 2019年12月1日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

R语言实现聚类kmeans

R语言实现聚类kmeans

R语言中文社区

3+阅读 · 2019年2月14日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

【推荐】(Keras)LSTM多元时序预测教程

【推荐】(Keras)LSTM多元时序预测教程

机器学习研究会

24+阅读 · 2017年8月14日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Randomized Dimensionality Reduction for Facility Location and Single-Linkage Clustering

Arxiv

0+阅读 · 2021年7月5日

Rates of Estimation of Optimal Transport Maps using Plug-in Estimators via Barycentric Projections

Arxiv

0+阅读 · 2021年7月4日

Gradient-based training of Gaussian Mixture Models for High-Dimensional Streaming Data

Arxiv

0+阅读 · 2021年7月2日

Deep learning-based statistical noise reduction for multidimensional spectral data

Arxiv

0+阅读 · 2021年7月2日

Design Optimization of Monoblade Autorotating Pods To Exhibit an Unconventional Descent Technique Using Glauert's Modelling

Arxiv

0+阅读 · 2021年7月1日

Visualizing the geometry of labeled high-dimensional data with spheres

Arxiv

0+阅读 · 2021年7月1日

Adaptive Sequential Design for a Single Time-Series

Adaptive Sequential Design for a Single Time-Series

Arxiv

0+阅读 · 2021年7月1日

On the Convergence of Stochastic Extragradient for Bilinear Games with Restarted Iteration Averaging

Arxiv

0+阅读 · 2021年6月30日

A Restricted-Domain Dual Formulation for Two-Phase Image Segmentation

A Restricted-Domain Dual Formulation for Two-Phase Image Segmentation

Arxiv

3+阅读 · 2018年7月30日

A three domain covariance framework for EEG/MEG data

Arxiv

3+阅读 · 2014年10月9日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【斯坦福大学博士论文】大规模和高维统计学习方法和算法，147页pdf， Large-scale and high-dimensional statistical learning methods and algorithms

专知会员服务

26+阅读 · 2020年6月13日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【ECML-PKDD 2019】带歧义的分类变量编码（Encoding Categorical Variables with Ambiguity）

【ECML-PKDD 2019】带歧义的分类变量编码（Encoding Categorical Variables with Ambiguity）

专知会员服务

5+阅读 · 2019年12月1日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【牛津大学博士论文】将序列结构与几何结构融入深度神经网络

工程视角：影响战争进程的小型无人机

企业级AI应用开发：从技术选型到生产落地

AI生成代码缺陷综述

相关资讯

R语言实现聚类kmeans

R语言实现聚类kmeans

R语言中文社区

3+阅读 · 2019年2月14日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

【推荐】(Keras)LSTM多元时序预测教程

【推荐】(Keras)LSTM多元时序预测教程

机器学习研究会

24+阅读 · 2017年8月14日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Randomized Dimensionality Reduction for Facility Location and Single-Linkage Clustering

Arxiv

0+阅读 · 2021年7月5日

Rates of Estimation of Optimal Transport Maps using Plug-in Estimators via Barycentric Projections

Arxiv

0+阅读 · 2021年7月4日

Gradient-based training of Gaussian Mixture Models for High-Dimensional Streaming Data

Arxiv

0+阅读 · 2021年7月2日

Deep learning-based statistical noise reduction for multidimensional spectral data

Arxiv

0+阅读 · 2021年7月2日

Design Optimization of Monoblade Autorotating Pods To Exhibit an Unconventional Descent Technique Using Glauert's Modelling

Arxiv

0+阅读 · 2021年7月1日

Visualizing the geometry of labeled high-dimensional data with spheres

Arxiv

0+阅读 · 2021年7月1日

Adaptive Sequential Design for a Single Time-Series

Adaptive Sequential Design for a Single Time-Series

Arxiv

0+阅读 · 2021年7月1日

On the Convergence of Stochastic Extragradient for Bilinear Games with Restarted Iteration Averaging

Arxiv

0+阅读 · 2021年6月30日

A Restricted-Domain Dual Formulation for Two-Phase Image Segmentation

A Restricted-Domain Dual Formulation for Two-Phase Image Segmentation

Arxiv

3+阅读 · 2018年7月30日

A three domain covariance framework for EEG/MEG data

Arxiv

3+阅读 · 2014年10月9日

微信扫码咨询专知VIP会员