Finite mixture models are a useful statistical model class for clustering and density approximation. In the Bayesian framework finite mixture models require the specification of suitable priors in addition to the data model. These priors allow to avoid spurious results and provide a principled way to define cluster shapes and a preference for specific cluster solutions. A generic model estimation scheme for finite mixtures with a fixed number of components is available using Markov chain Monte Carlo (MCMC) sampling with data augmentation. The posterior allows to assess uncertainty in a comprehensive way, but component-specific posterior inference requires resolving the label switching issue. In this paper we focus on the application of Bayesian finite mixture models for clustering. We start with discussing suitable specification, estimation and inference of the model if the number of components is assumed to be known. We then continue to explain suitable strategies for fitting Bayesian finite mixture models when the number of components is not known. In addition, all steps required to perform Bayesian finite mixture modeling are illustrated on a data example where a finite mixture model of multivariate Gaussian distributions is fitted. Suitable prior specification, estimation using MCMC and posterior inference are discussed for this example assuming the number of components to be known as well as unknown.
翻译:暂无翻译