为多个相关响应变量和高维预测器选择结构化贝叶斯变数 (Structured Bayesian variable selection for multiple correlated response variables and high-dimensional predictors)

It is becoming increasingly common to study complex associations between multiple phenotypes and high-dimensional genomic features in biomedicine. However, it requires flexible and efficient joint statistical models if there are correlations between multiple response variables and between high-dimensional predictors. We propose a structured multivariate Bayesian variable selection model to identify sparse predictors associated with multiple correlated response variables. The approach makes use of known structure information between the multiple response variables and high-dimensional predictors via a Markov random field (MRF) prior for the latent indicator variables of the coefficient matrix of a sparse seemingly unrelated regressions (SSUR). The structure information included in the MRF prior can improve the model performance (i.e., variable selection and response prediction) compared to other common priors. In addition, we employ random effects to capture heterogeneity of grouped samples. The proposed approach is validated by simulation studies and applied to a pharmacogenomic study which includes pharmacological profiling and multi-omics data (i.e., gene expression, copy number variation and mutation) from in vitro anti-cancer drug sensitivity screening.

翻译：研究生物医学中多种苯型和高维基因组特征之间的复杂联系越来越普遍,然而,如果多种反应变量之间和高维预测器之间有相互关系,则需要灵活而高效的联合统计模型; 我们提议一个结构化的多变贝叶斯变量选择模型,以查明与多重相关反应变量有关的稀散预测器; 这种方法利用多种反应变量和高维预测器之间的已知结构信息,然后通过Markov随机字段(MRF),用于稀疏似乎无关的回归物(SSUR)的系数矩阵的潜在指标变量; 之前的MRF中所包含的结构信息可以改进模型性能(即变量选择和响应预测),而与其他常见的先前数据相比较; 此外,我们采用随机效应来捕捉组装样品的异质性; 模拟研究证实了拟议方法,并应用于包括药理学特征分析和多系药物敏感性筛查(即基因表达、复制数变异和突变)。