Mixtures of probabilistic principal component analysis (MPPCA) is a well-known mixture model extension of principal component analysis (PCA). Similar to PCA, MPPCA assumes the data samples in each mixture contain homoscedastic noise. However, datasets with heterogeneous noise across samples are becoming increasingly common, as larger datasets are generated by collecting samples from several sources with varying noise profiles. The performance of MPPCA is suboptimal for data with heteroscedastic noise across samples. This paper proposes a heteroscedastic mixtures of probabilistic PCA technique (HeMPPCAT) that uses a generalized expectation-maximization (GEM) algorithm to jointly estimate the unknown underlying factors, means, and noise variances under a heteroscedastic noise setting. Simulation results illustrate the improved factor estimates and clustering accuracies of HeMPPCAT compared to MPPCA.
翻译:概率主要成分分析混合体(MPPCA)是主要成分分析(PCA)的一个众所周知的混合物模型延伸。与五氯苯甲醚类似,MPPCA假定每种混合物的数据样品含有同质噪音,但是,由于从具有不同噪音剖面的若干来源收集样本,从而生成了更大的数据集,因此各样本中不同噪音的数据集越来越多。MPPCA的性能对于不同样本的超异性噪声数据来说并不理想。本文建议使用一种通用的预期最大化(GEM)算法来共同估计未知的基本要素、手段和噪音差异。模拟结果说明了与MPPCA相比,HMPPCAT的系数估计和组合能力得到改善。