Multivariate categorical data are routinely collected in many application areas. As the number of cells in the table grows exponentially with the number of variables, many or even most cells will contain zero observations. This severe sparsity motivates appropriate statistical methodologies that effectively reduce the number of free parameters, with penalized log-linear models and latent structure analysis being popular options. This article proposes a fundamentally new class of methods, which we refer to as Mixture of Log Linear models (mills). Combining latent class analysis and log-linear models, mills defines a novel Bayesian methodology to model complex multivariate categorical with flexibility and interpretability. Mills is shown to have key advantages over alternative methods for contingency tables in simulations and an application investigating the relation among suicide attempts and empathy.
翻译:在许多应用领域,经常收集多变量绝对数据。由于表中的单元格数随着变量数的成倍增长,许多甚至大多数单元格将包含零观测。这种严重的宽度促使采用适当的统计方法,有效减少自由参数数,而受惩罚的日志线性模型和潜在结构分析是受欢迎的选择。本条款提出了一种全新的方法类别,我们称之为日志线性模型(Mixture of Log Linear 模型(Mills)的混合体。结合潜伏类分析和日志线性模型,磨坊界定了一种新型的贝叶斯方法,用以模拟复杂的多变量绝对性,具有灵活性和可解释性。在模拟和调查自杀企图和同情感之间关系的应用中,Mills对应急表的替代方法具有关键优势。