In social science research, understanding latent structures in populations through survey data with categorical responses is a common and important task. Traditional methods like Factor Analysis and Latent Class Analysis have limitations, particularly in handling categorical data and accommodating mixed memberships in latent structures, respectively. Moreover, analyzing survey responses with missing values using these methods is quite challenging. This study introduces a Hierarchical Dirichlet Process Mixture of Products of Multinomial Distributions (HDPMPM) model, which leverages the flexibility of nonparametric Bayesian methods to address these limitations. The HDPMPM model allows for multiple latent classes within individuals and supports a potentially infinite number of mixture components. Additionally, it incorporates missing data imputation directly into the model's Gibbs sampling process. By applying a truncated stick-breaking representation of the Dirichlet process, we can derive a Gibbs sampling scheme for posterior inference. An application of the HDPMPM model to the 2016 American National Election Study (ANES) data demonstrates its effectiveness in identifying political profiles and handling missing data scenarios, including those that are missing at random (MAR) and missing completely at random (MCAR). The results show that the HDPMPM model successfully recovers dominant profiles and manages complex latent structures in survey data, providing an alternative tool for social science researchers in dealing with categorical data with missing values.
翻译:暂无翻译