通过地物选择将高维数据分组 (Clustering High-dimensional Data via Feature Selection) - 专知论文

会员服务 ·

0

簇 · 特征选择 · Analysis · 可辨认的 · Minimax ·

2022 年 10 月 27 日

Clustering High-dimensional Data via Feature Selection

翻译：通过地物选择将高维数据分组

Tianqi Liu,Yu Lu,Biqing Zhu,Hongyu Zhao

from arxiv, Accepted at Biometrics Journal (https://onlinelibrary.wiley.com/doi/epdf/10.1111/biom.13665)

High-dimensional clustering analysis is a challenging problem in statistics and machine learning, with broad applications such as the analysis of microarray data and RNA-seq data. In this paper, we propose a new clustering procedure called Spectral Clustering with Feature Selection (SC-FS), where we first obtain an initial estimate of labels via spectral clustering, then select a small fraction of features with the largest R-squared with these labels, i.e., the proportion of variation explained by group labels, and conduct clustering again using selected features. Under mild conditions, we prove that the proposed method identifies all informative features with high probability and achieves minimax optimal clustering error rate for the sparse Gaussian mixture model. Applications of SC-FS to four real world data sets demonstrate its usefulness in clustering high-dimensional data.

翻译：在统计和机器学习中,高维集群分析是一个具有挑战性的问题,其应用范围很广,例如微阵列数据和RNA-seq数据的分析。在本文中,我们提出了一个新的集群程序,称为 " 以地貌选择群集 " (SC-FS),我们首先通过光谱群集获得对标签的初步估计,然后选择带有这些标签的最大R方位的一小部分特征,即由群落标签解释的变异比例,并利用选定的特征进行群集。在温和的条件下,我们证明拟议的方法确定了所有信息性能高概率的信息性特征,并实现了稀有高斯混合模型最小最佳群集误率。SC-FS对四个真实世界数据集的应用证明了其在将高维数据组合中的有用性。

0

相关内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

基于GATA2敲除及GATA2-GFP标记的hES细胞系研究内皮-造血转变的调控机制

国家自然科学基金

0+阅读 · 2015年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

Cr2+,Fe2+掺杂的II-VI晶体中红外激光器研究

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

用dsDNA微阵列筛选NF-κDNA靶点及靶基因

国家自然科学基金

0+阅读 · 2008年12月31日

Score function-based tests for ultrahigh-dimensional linear models

Arxiv

0+阅读 · 2022年12月16日

Learning particle swarming models from data with Gaussian processes

Learning particle swarming models from data with Gaussian processes

Arxiv

0+阅读 · 2022年12月16日

Learning with Differentiable Algorithms

Arxiv

11+阅读 · 2022年9月1日

MetAug: Contrastive Learning via Meta Feature Augmentation

Arxiv

10+阅读 · 2022年3月10日

Meta-Learning to Cluster

Meta-Learning to Cluster

Arxiv

18+阅读 · 2019年10月30日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

生成式人工智能导论：可靠性、负责任开发及实际应用（第二版）

《2025财年美陆军转型倡议（ATI）部队结构与组织提案》

【CMU博士论文】分布偏移下的可信机器学习

智能体 EDA 的曙光：自主数字芯片设计综述

相关资讯

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

相关论文

Score function-based tests for ultrahigh-dimensional linear models

Arxiv

0+阅读 · 2022年12月16日

Learning particle swarming models from data with Gaussian processes

Learning particle swarming models from data with Gaussian processes

Arxiv

0+阅读 · 2022年12月16日

Learning with Differentiable Algorithms

Arxiv

11+阅读 · 2022年9月1日

MetAug: Contrastive Learning via Meta Feature Augmentation

Arxiv

10+阅读 · 2022年3月10日

Meta-Learning to Cluster

Meta-Learning to Cluster

Arxiv

18+阅读 · 2019年10月30日

相关基金

基于GATA2敲除及GATA2-GFP标记的hES细胞系研究内皮-造血转变的调控机制

国家自然科学基金

0+阅读 · 2015年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

Cr2+,Fe2+掺杂的II-VI晶体中红外激光器研究

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

用dsDNA微阵列筛选NF-κDNA靶点及靶基因

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员