Markov Chain Monte Carlo (MCMC) sampling is computationally expensive, especially for complex models. Alternative methods make simplifying assumptions about the posterior to reduce computational burden, but their impact on predictive performance remains unclear. This paper compares MCMC and non-MCMC methods for high-dimensional penalized regression, examining when computational shortcuts are justified for prediction tasks. We conduct a comprehensive simulation study using high-dimensional tabular data, then validate findings with empirical datasets featuring both continuous and binary outcomes. An in-depth analysis of one dataset provides a step-by-step tutorial implementing various algorithms in R. Our results show that mean-field variational inference consistently performs comparably to MCMC methods. In simulations, mean-field VI exhibited 3-90\% higher MSE across scenarios while reducing runtime by 7-30x compared to Hamiltonian Monte Carlo. Empirical datasets revealed dramatic speed-ups (100-400x) in some cases with similar or superior predictive performance. However, performance varied: some cases showed over 100x MSE increases with only 30x speed-ups, highlighting the context-dependent nature of these trade-offs.
翻译:马尔可夫链蒙特卡洛(MCMC)采样计算成本高昂,尤其对于复杂模型。替代方法通过对后验分布进行简化假设来降低计算负担,但其对预测性能的影响尚不明确。本文比较了高维惩罚回归中MCMC与非MCMC方法,探讨预测任务中计算捷径的适用条件。我们使用高维表格数据进行了全面的模拟研究,并通过包含连续与二元结果的实证数据集验证了发现。针对某数据集的深度分析提供了在R语言中逐步实现多种算法的教程。结果表明,平均场变分推断始终表现出与MCMC方法相当的性能。在模拟实验中,平均场变分推断相较于哈密顿蒙特卡洛方法,在不同场景下均方误差高出3-90%,同时运行时间缩短7-30倍。实证数据集显示,部分案例在获得相似或更优预测性能的同时实现了显著加速(100-400倍)。但性能存在差异:某些案例在仅加速30倍时均方误差增长超100倍,凸显了此类权衡具有情境依赖性。