Discrete data are abundant and often arise as counts or rounded data. These data commonly exhibit complex distributional features such as zero-inflation, over- or under-dispersion, boundedness, and heaping, which render many parametric models inadequate. Yet even for parametric regression models, approximations such as MCMC typically are needed for posterior inference. This paper introduces a Bayesian modeling and algorithmic framework that enables semiparametric regression analysis for discrete data with Monte Carlo (not MCMC) sampling. The proposed approach pairs a nonparametric marginal model with a latent linear regression model to encourage both flexibility and interpretability, and delivers posterior consistency even under model misspecification. For a parametric or large-sample approximation of this model, we identify a class of conjugate priors with (pseudo) closed-form posteriors. All posterior and predictive distributions are available analytically or via Monte Carlo sampling. These tools are broadly useful for linear regression, nonlinear models via basis expansions, and variable selection with discrete data. Simulation studies demonstrate significant advantages in computing, prediction, estimation, and selection relative to existing alternatives. This novel approach is applied to self-reported mental health data that exhibit zero-inflation, overdispersion, boundedness, and heaping.
翻译:分散数据是丰富的,通常作为计数或四舍五入数据产生。这些数据通常显示出复杂的分布特征,如零通货膨胀、过度或分散程度、界限和加热等,使得许多参数模型不完备。即使对于参数回归模型来说,对于后推推论来说,通常也需要像MCMC这样的近似值。本文介绍一个贝叶斯模型和算法框架,以便能够通过蒙特卡洛(而不是MCMC)取样对离散数据进行半参数回归分析。拟议方法将一个非参数边际模型配对成一个隐性线性线性回归模型,鼓励灵活性和可解释性,并甚至在模型误差的情况下也提供后推的一致性。对于参数或大模版的近似模型来说,我们通常需要使用类似于MCMCMC的近似值。对于后推法的后推法后推论和逻辑。所有后推法和预测分布都可用分析或通过蒙特卡洛取样。这些工具对线性回归、非线性模型、基础扩展和可变的线性回归模型和离性数据选择具有广泛的实用性作用。对于离性数据、模拟、模拟分析式的模拟模型和模拟模拟式的自我选择方法展示了现有自我选择。在目前健康评估中的自我选择方法中的优势。模拟、模拟、模拟、现有数据选择、模拟到最新的自我测算方法、模拟到新的推算方法、模拟到新的推算方法。