在以贝耶斯模式为基础的集群中摆脱维度的诅咒 (Escaping the curse of dimensionality in Bayesian model based clustering) - 专知论文

会员服务 ·

0

簇 · 维数灾难 · Performer · Oracle · MoDELS ·

2021 年 6 月 23 日

Escaping the curse of dimensionality in Bayesian model based clustering

翻译：在以贝耶斯模式为基础的集群中摆脱维度的诅咒

Noirrit Kiran Chandra,Antonio Canale,David B. Dunson

In many applications, there is interest in clustering very high-dimensional data. A common strategy is first stage dimensionality reduction followed by a standard clustering algorithm, such as k-means. This approach does not target dimension reduction to the clustering objective, and fails to quantify uncertainty. Model-based Bayesian approaches provide an appealing alternative, but often have poor performance in high-dimensions, producing too many or too few clusters. This article provides an explanation for this behavior through studying the clustering posterior in a non-standard setting with fixed sample size and increasing dimensionality. We show that the finite sample posterior tends to either assign every observation to a different cluster or all observations to the same cluster as dimension grows, depending on the kernels and prior specification but not on the true data-generating model. To find models avoiding this pitfall, we define a Bayesian oracle for clustering, with the oracle clustering posterior based on the true values of low-dimensional latent variables. We define a class of LAtent Mixtures for Bayesian (Lamb) clustering that have equivalent behavior to this oracle as dimension grows. Lamb is shown to have good performance in simulation studies and an application to inferring cell types based on scRNAseq.

翻译：在许多应用中,人们都有兴趣将非常高维的数据组合在一起。共同战略是第一阶段的维度减少, 并辅之以标准的群集算法, 例如 k- 运算法。这个方法并不针对群集目标的维度减少, 并且未能量化不确定性。以模型为基础的巴伊西亚方法提供了一个有吸引力的替代方法, 但通常在高二分化中表现不佳, 产生太多或太少的群集。本条通过在非标准设置中以固定的样本大小和日益增强的维度来研究群集后子体来解释这一行为。我们显示, 有限的样本后端组往往根据内核和先前的规格, 而不是根据真正的数据生成模型, 将每组的每个观察都指派给不同的群, 或所有观察都指派给同一组。为了找到避免这种陷阱的模型, 我们根据低维潜伏变量的真正价值来定义一个贝伊西亚( 蓝比) 类的Latent Mixturs 集群, 其行为都相当于这个尺寸增长的星系, 。在模拟中, 模级应用中, 显示一个良好的性。

0

相关内容

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【斯坦福大学博士论文】大规模和高维统计学习方法和算法，147页pdf， Large-scale and high-dimensional statistical learning methods and algorithms

专知会员服务

26+阅读 · 2020年6月13日

回顾机器学习公平的数学框架，Review of Mathematical frameworks for Fairness in Machine Learning

回顾机器学习公平的数学框架，Review of Mathematical frameworks for Fairness in Machine Learning

专知会员服务

38+阅读 · 2020年5月30日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知会员服务

122+阅读 · 2020年5月30日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【变分推断课件】Lectures on Variational Inference： Approximate Bayesian Inference in Machine Learning（附带pdf）

【变分推断课件】Lectures on Variational Inference： Approximate Bayesian Inference in Machine Learning（附带pdf）

专知会员服务

35+阅读 · 2019年11月30日

【NLP| 推荐文章】知识图谱问答系统的神经网络方法介绍（Introduction to Neural Network based Approaches for Question Answering over Knowledge Graphs）

专知会员服务

59+阅读 · 2019年11月24日

【DLBM-SS暑期课程】深度学习与贝叶斯方法 Deep Learning and Bayesian Methods

【DLBM-SS暑期课程】深度学习与贝叶斯方法 Deep Learning and Bayesian Methods

专知会员服务

67+阅读 · 2019年11月10日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新6篇图像分割相关论文—隐马尔可夫随机场、级联三维全卷积、信号处理、全卷积网络、多源域适应、循环分割

【论文推荐】最新6篇图像分割相关论文—隐马尔可夫随机场、级联三维全卷积、信号处理、全卷积网络、多源域适应、循环分割

专知

9+阅读 · 2018年3月21日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Bayesian Context Trees: Modelling and exact inference for discrete time series

Bayesian Context Trees: Modelling and exact inference for discrete time series

Arxiv

0+阅读 · 2021年8月25日

Fully Bayesian Estimation under Dependent and Informative Cluster Sampling

Arxiv

0+阅读 · 2021年8月24日

Bayesian Inference using the Proximal Mapping: Uncertainty Quantification under Varying Dimensionality

Arxiv

0+阅读 · 2021年8月24日

Estimation of the odds ratio in a proportional odds model with censored time-lagged outcome in a randomized clinical trial

Arxiv

0+阅读 · 2021年8月24日

ParticleAugment: Sampling-Based Data Augmentation

Arxiv

0+阅读 · 2021年8月24日

Graph-LDA: Graph Structure Priors to Improve the Accuracy in Few-Shot Classification

Arxiv

1+阅读 · 2021年8月23日

A comparison of different clustering approaches for high-dimensional presence-absence data

A comparison of different clustering approaches for high-dimensional presence-absence data

Arxiv

0+阅读 · 2021年8月20日

Quadratic Discriminant Analysis by Projection

Arxiv

0+阅读 · 2021年8月20日

Meta-Learning to Cluster

Meta-Learning to Cluster

Arxiv

17+阅读 · 2019年10月30日

On orthogonal projections for dimension reduction and applications in variational loss functions for learning problems

Arxiv

3+阅读 · 2019年1月22日

VIP会员

文章信息

相关主题

相关VIP内容

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【斯坦福大学博士论文】大规模和高维统计学习方法和算法，147页pdf， Large-scale and high-dimensional statistical learning methods and algorithms

专知会员服务

26+阅读 · 2020年6月13日

回顾机器学习公平的数学框架，Review of Mathematical frameworks for Fairness in Machine Learning

回顾机器学习公平的数学框架，Review of Mathematical frameworks for Fairness in Machine Learning

专知会员服务

38+阅读 · 2020年5月30日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知会员服务

122+阅读 · 2020年5月30日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【变分推断课件】Lectures on Variational Inference： Approximate Bayesian Inference in Machine Learning（附带pdf）

【变分推断课件】Lectures on Variational Inference： Approximate Bayesian Inference in Machine Learning（附带pdf）

专知会员服务

35+阅读 · 2019年11月30日

【NLP| 推荐文章】知识图谱问答系统的神经网络方法介绍（Introduction to Neural Network based Approaches for Question Answering over Knowledge Graphs）

专知会员服务

59+阅读 · 2019年11月24日

【DLBM-SS暑期课程】深度学习与贝叶斯方法 Deep Learning and Bayesian Methods

【DLBM-SS暑期课程】深度学习与贝叶斯方法 Deep Learning and Bayesian Methods

专知会员服务

67+阅读 · 2019年11月10日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新6篇图像分割相关论文—隐马尔可夫随机场、级联三维全卷积、信号处理、全卷积网络、多源域适应、循环分割

【论文推荐】最新6篇图像分割相关论文—隐马尔可夫随机场、级联三维全卷积、信号处理、全卷积网络、多源域适应、循环分割

专知

9+阅读 · 2018年3月21日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Bayesian Context Trees: Modelling and exact inference for discrete time series

Bayesian Context Trees: Modelling and exact inference for discrete time series

Arxiv

0+阅读 · 2021年8月25日

Fully Bayesian Estimation under Dependent and Informative Cluster Sampling

Arxiv

0+阅读 · 2021年8月24日

Bayesian Inference using the Proximal Mapping: Uncertainty Quantification under Varying Dimensionality

Arxiv

0+阅读 · 2021年8月24日

Estimation of the odds ratio in a proportional odds model with censored time-lagged outcome in a randomized clinical trial

Arxiv

0+阅读 · 2021年8月24日

ParticleAugment: Sampling-Based Data Augmentation

Arxiv

0+阅读 · 2021年8月24日

Graph-LDA: Graph Structure Priors to Improve the Accuracy in Few-Shot Classification

Arxiv

1+阅读 · 2021年8月23日

A comparison of different clustering approaches for high-dimensional presence-absence data

A comparison of different clustering approaches for high-dimensional presence-absence data

Arxiv

0+阅读 · 2021年8月20日

Quadratic Discriminant Analysis by Projection

Arxiv

0+阅读 · 2021年8月20日

Meta-Learning to Cluster

Meta-Learning to Cluster

Arxiv

17+阅读 · 2019年10月30日

On orthogonal projections for dimension reduction and applications in variational loss functions for learning problems

Arxiv

3+阅读 · 2019年1月22日

微信扫码咨询专知VIP会员