Large crossed mixed effects models with imbalanced structures and missing data pose major computational challenges for standard Bayesian posterior sampling algorithms, as the computational complexity is usually superlinear in the number of observations. We propose two efficient subset-based stochastic gradient MCMC algorithms for such crossed mixed effects model, which facilitate scalable inference on both the variance components and regression coefficients. The first algorithm is developed for balanced design without missing observations, where we leverage the closed-form expression of precision matrix for the full data matrix. The second algorithm, which we call the pigeonhole stochastic gradient Langevin dynamics (PSGLD), is developed for both balanced and unbalanced designs with potentially a large proportion of missing observations. Our PSGLD algorithm imputes the latent crossed random effects by running short Markov chains and then samples the model parameters of variance components and regression coefficients at each MCMC iteration. We provide theoretical guarantee by showing the convergence of the output distribution from the proposed algorithms to the target non-log-concave posterior distribution. A variety of numerical experiments based on both synthetic and real data demonstrate that the proposed algorithms can significantly reduce the computational cost of the standard MCMC algorithms and better balance the approximation accuracy and computational efficiency.
翻译:由于计算复杂性通常是观测数量的超线性,因此对标准的巴耶西亚海边取样算法构成重大的计算挑战。我们建议为这种跨多重效应模型采用两种高效的子集随机梯度的MCMC算法,便于对差异组成部分和回归系数进行可缩放的推断。第一个算法是为平衡设计而开发的,不缺少观察,我们利用精确矩阵的封闭式表达式来进行完整的数据矩阵。第二个算法,我们称之为鸽子洞梯度兰埃文动力学(PSGLD),是为平衡和不平衡的设计而开发的,可能缺少大量观测。我们的PSGLD算法通过运行短的马尔科夫链对潜在的跨随机效应进行精细化,然后在每次监测监测中抽取差异组成部分和回归系数的模型参数。我们通过显示拟议算法与目标非对映式后分布的结合,从理论上提供保证。基于合成和真实数据测算法和精确度计算方法的各种数字实验,可以大大降低拟议测算法的成本和精确度。