BayesSUR: 用于在线性回归中选择高维多变量的 Bayesian 变量和共变量的 R 包件 (BayesSUR: An R package for high-dimensional multivariate Bayesian variable and covariance selection in linear regression)

In molecular biology, advances in high-throughput technologies have made it possible to study complex multivariate phenotypes and their simultaneous associations with high-dimensional genomic and other omics data, a problem that can be studied with high-dimensional multi-response regression, where the response variables are potentially highly correlated. To this purpose, we recently introduced several multivariate Bayesian variable and covariance selection models, e.g., Bayesian estimation methods for sparse seemingly unrelated regression for variable and covariance selection. Several variable selection priors have been implemented in this context, in particular the hotspot detection prior for latent variable inclusion indicators, which results in sparse variable selection for associations between predictors and multiple phenotypes. We also propose an alternative, which uses a Markov random field (MRF) prior for incorporating prior knowledge about the dependence structure of the inclusion indicators. Inference of Bayesian seemingly unrelated regression (SUR) by Markov chain Monte Carlo methods is made computationally feasible by factorisation of the covariance matrix amongst the response variables. In this paper we present BayesSUR, an R package, which allows the user to easily specify and run a range of different Bayesian SUR models, which have been implemented in C++ for computational efficiency. The R package allows the specification of the models in a modular way, where the user chooses the priors for variable selection and for covariance selection separately. We demonstrate the performance of sparse SUR models with the hotspot prior and spike-and-slab MRF prior on synthetic and real data sets representing eQTL or mQTL studies and in vitro anti-cancer drug screening studies as examples for typical applications.

翻译：在分子生物学中,高通量技术的进步使得有可能研究复杂的多变量苯菌类型及其与高度基因组和其他肿瘤数据的同步关联。这个问题可以与高度多反退研究,因为响应变量可能具有高度关联性。为此,我们最近引入了数种多变量贝叶斯变数和共变选择模型,例如,巴伊西亚用于不同变异和变异选择的偏差不相干回归的稀疏偏重回归的估算方法。在此背景下,已经实施了若干变异选择前期,特别是隐性包容指标之前的热点检测,这导致预测器和多苯型之间联系的变异性选择很少。我们还提出了一种替代方案,即使用马尔科夫随机字段(MRRF)来纳入先前关于包容指标依赖性结构的知识。Bayesian看似无关的回归(SUR)被Markov 链的蒙特卡洛方法推导出,通过反应变异性反应变异性分析矩阵在应对变量之间进行计算的可行性。在本文件中,我们介绍的BayesSUR、Rpass pass(R pass)中展示了一种可变式的Sure deal deal deal deal ex ex deal ex ex ex ex ex ex ex ex ex expreal deplactal ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex expreal ex ex ex ex ex ex ex exproduplational delectional lactions) ex.