聚类的有名家庭五氯苯甲醚代谢性家庭模型的再生混合模型 (Repulsive Mixture Models of Exponential Family PCA for Clustering)

The mixture extension of exponential family principal component analysis (EPCA) was designed to encode much more structural information about data distribution than the traditional EPCA does. For example, due to the linearity of EPCA's essential form, nonlinear cluster structures cannot be easily handled, but they are explicitly modeled by the mixing extensions. However, the traditional mixture of local EPCAs has the problem of model redundancy, i.e., overlaps among mixing components, which may cause ambiguity for data clustering. To alleviate this problem, in this paper, a repulsiveness-encouraging prior is introduced among mixing components and a diversified EPCA mixture (DEPCAM) model is developed in the Bayesian framework. Specifically, a determinantal point process (DPP) is exploited as a diversity-encouraging prior distribution over the joint local EPCAs. As required, a matrix-valued measure for L-ensemble kernel is designed, within which, $\ell_1$ constraints are imposed to facilitate selecting effective PCs of local EPCAs, and angular based similarity measure are proposed. An efficient variational EM algorithm is derived to perform parameter learning and hidden variable inference. Experimental results on both synthetic and real-world datasets confirm the effectiveness of the proposed method in terms of model parsimony and generalization ability on unseen test data.

翻译：指数式家庭主要成分分析(EPCA)的混合延伸(EPCA)旨在将关于数据分布的结构性信息比传统的EPCA(EPCA)系统多得多,例如,由于EPCA基本形态的直线性,非线性集群结构不容易处理,但以混合扩展为明确模型。然而,当地ECPA的传统混合物存在模式冗余问题,即混合成分之间的重叠,这可能造成数据集群的模糊性。为缓解这一问题,本文件在混合成分中引入了更难接受的先导,而在Bayesian框架内开发了多样化的ECPCA混合物(DEPCAM)模型。具体地说,确定点过程(DPP)被利用为混合扩展的先导,在联合EPCA上进行多样性强化的先前分布。按照要求,设计了L-entemble 内核部分的矩阵价值衡量标准,其中对便利选择有效的当地ECA PCA 有效 PCA 规定了1美元的限制,而基于类似性的ECPCA 混合混合物(DEAMAM) 模型则是在Balityality 模型中提议的关于秘密实验性实验性数据测试结果。

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

【CVPR2020】在线深度聚类的无监督表示学习, Online Deep Clustering for Unsupervised Representation Learning

专知会员服务

69+阅读 · 2020年6月19日

【KDD2020】AutoFIS: 因数分解模型中用于预测点击率的自动特征交互选择

专知会员服务

12+阅读 · 2020年5月27日

因果图，Causal Graphs，52页ppt

专知会员服务

253+阅读 · 2020年4月19日

可视化特征属性基线的影响，Visualizing the Impact of Feature Attribution Baselines

专知会员服务

10+阅读 · 2020年1月16日