Discrete data are abundant and often arise as counts or rounded data. These data commonly exhibit complex distributional features such as zero-inflation, over- or under-dispersion, boundedness, and heaping, which render many parametric models inadequate. Yet even for parametric regression models, conjugate priors and closed-form posteriors are typically unavailable, which necessitates approximations such as MCMC for posterior inference. This paper introduces a Bayesian modeling and algorithmic framework that enables semiparametric regression analysis for discrete data with Monte Carlo (not MCMC) sampling. The proposed approach pairs a nonparametric marginal model with a latent linear regression model to encourage both flexibility and interpretability, and delivers posterior consistency even under model misspecification. For a parametric or large-sample approximation of this model, we identify a class of conjugate priors with (pseudo) closed-form posteriors. All posterior and predictive distributions are available analytically or via Monte Carlo sampling. These tools are broadly useful for linear regression, nonlinear models via basis expansions, and variable selection with discrete data. Simulation studies demonstrate significant advantages in computing, prediction, estimation, and selection relative to existing alternatives. This novel approach is applied to self-reported mental health data that exhibit zero-inflation, overdispersion, boundedness, and heaping.
翻译:这些数据通常呈现出复杂的分布特征,如零通缩、超度或低度分布、约束和加热等,使得许多参数模型不完备。即使对准回归模型而言,通常也不具备同质前科和封闭式后形模型,这就需要近似近似,如用于后向推断的MCMC(MMC)等。本文介绍一个巴伊西亚模型和算法框架,以便能够对蒙特卡洛(非MCMC)取样的离散数据进行半参数回归分析。拟议方法将非对称边缘模型与潜在的线性回归模型配对,以鼓励灵活性和可解释性,并甚至在模型误差的情况下也提供后向一致性。对于这一模型的参数或大缩影近似近似,我们确定了一个具有(假成)闭式后方后方的后方模型的类。所有后方和预测性分布都可进行分析或通过Monte Carlo取样。这些工具对线性回归、非线性边际边际回归模型和潜在线性回归模型具有广泛的实用性作用,通过扩展模型鼓励灵活性和可解释性回归性模型,并且通过模型进行模拟的自我估算。在模型上进行模拟的模型上进行模拟数据选择,并演示现有数据选择。