Modeling binary and categorical data is one of the most commonly encountered tasks of applied statisticians and econometricians. While Bayesian methods in this context have been available for decades now, they often require a high level of familiarity with Bayesian statistics or suffer from issues such as low sampling efficiency. To contribute to the accessibility of Bayesian models for binary and categorical data, we introduce novel latent variable representations based on P\'olya-Gamma random variables for a range of commonly encountered logistic regression models. From these latent variable representations, new Gibbs sampling algorithms for binary, binomial, and multinomial logit models are derived. All models allow for a conditionally Gaussian likelihood representation, rendering extensions to more complex modeling frameworks such as state space models straightforward. However, sampling efficiency may still be an issue in these data augmentation based estimation frameworks. To counteract this, novel marginal data augmentation strategies are developed and discussed in detail. The merits of our approach are illustrated through extensive simulations and real data applications.
翻译:模拟二元数据和绝对数据是应用统计学家和计量经济学家最常遇到的任务之一。虽然贝叶斯方法在这方面已有几十年,但它们往往需要高度熟悉贝叶斯统计,或受到诸如低采样效率等问题的影响。为了帮助获得巴伊西亚数据二元和绝对数据的模型,我们为一系列常见的后勤回归模型采用基于P\'olya-Gamma随机变量的新的潜在变量表示方式。从这些潜在变量表示方式中,可以得出用于二元、二元和多元日志模型的新的吉布斯抽样算法。所有模型都允许有条件地使用高斯概率表示方式,从而扩展更为复杂的模型框架,例如国家空间模型。然而,在这些数据增强估算框架中,取样效率可能仍然是一个问题。为了对付这一问题,我们制定并详细讨论了新的边际数据增强战略。我们的方法的优点是通过广泛的模拟和真实数据应用加以说明。