High-dimensional data often arise from clinical genomics research to infer relevant predictors of a particular trait. A way to improve the predictive performance is to include information on the predictors derived from prior knowledge or previous studies. Such information is also referred to as ``co-data''. To this aim, we develop a novel Bayesian model for including co-data in a high-dimensional regression framework, called Informative Horseshoe regression (infHS). The proposed approach regresses the prior variances of the regression parameters on the co-data variables, improving variable selection and prediction. We implement both a Gibbs sampler and a Variational approximation algorithm. The former is suited for applications of moderate dimensions which, besides prediction, target posterior inference, whereas the computational efficiency of the latter allows handling a very large number of variables. We show the benefits from including co-data with a simulation study. Eventually, we demonstrate that infHS outperforms competing approaches for two genomics applications.
翻译:临床基因组学研究往往产生高维数据,以推断特定特性的相关预测数据。改进预测性能的一个方法就是包括来自先前知识或以往研究的预测性数据信息。这类信息也被称为“co-data' ” 。为此,我们开发了一种新型的贝叶斯模型,将共同数据纳入一个高维回归框架,称为“Informative Housshoe 回归(infHS ) 。拟议方法将共同数据变量的回归参数先前的差异倒退过来,改进变量的选择和预测。我们既采用Gibbs取样器,又采用动态近似算法。前者适用于中度应用,其中除预测外,目标远方推算法还允许处理大量变量。我们用模拟研究来显示将共同数据包括在内的好处。最后,我们证明,FHS系统将两种基因组应用的竞相方法排在两种基因组学应用中。</s>