Factor-analytic Gaussian mixture models are often employed as a model-based approach to clustering high-dimensional data. Typically, the numbers of clusters and latent factors must be specified in advance of model fitting, and remain fixed. The pair which optimises some model selection criterion is then chosen. For computational reasons, models in which the number of latent factors differ across clusters are rarely considered. Here the infinite mixture of infinite factor analysers (IMIFA) model is introduced. IMIFA employs a Pitman-Yor process prior to facilitate automatic inference of the number of clusters using the stick-breaking construction and a slice sampler. Furthermore, IMIFA employs multiplicative gamma process shrinkage priors to allow cluster-specific numbers of factors, automatically inferred via an adaptive Gibbs sampler. IMIFA is presented as the flagship of a family of factor-analytic mixture models, providing flexible approaches to clustering high-dimensional data. Applications to a benchmark data set, metabolomic spectral data, and a manifold learning handwritten digit example illustrate the IMIFA model and its advantageous features. These include obviating the need for model selection criteria, reducing the computational burden associated with the search of the model space, improving clustering performance by allowing cluster-specific numbers of factors, and quantifying uncertainty in the numbers of clusters and cluster-specific factors.
翻译:参数分析高斯混合模型通常被用作一种基于模型的方法,用于对高维数据进行分组。通常,在模型安装之前必须具体说明组群和潜在因素的数量,并保持固定。然后选择选择选择某些模型选择标准。为了计算原因,很少考虑不同组群之间潜在因素数量不同的模型。这里采用无限因素分析器(IMIFA)模型的无限混合。IMIFA采用一个Pitman-Yor程序,然后用棍棒破碎构造和一个切片取样器自动推断组群的数量。此外,IMIFA采用多复制性伽马进程缩缩缩,以允许特定组群数因素,然后通过一个适应性的Gib抽样器自动推断。IMIFA是一组要素的旗舰,为组合高维数据提供了灵活的组合方法。对一套基准数据的应用、代谢光谱数据,以及一个多重手写的数字模型说明IMIFA模型及其优点。此外,IMIFA还采用了多种复制性伽马进程进程进程缩缩缩缩缩图,以允许通过一个可量化的分类集集集集集成数计算要素,从而减少具体空间选择指标的模型,从而减少具体研究组群集的计算要素的精确度,从而减少具体研究组群集数数的计算因素的计算,从而降低了空间数据。