The recent explosion of genetic and high dimensional biobank and 'omic' data has provided researchers with the opportunity to investigate the shared genetic origin (pleiotropy) of hundreds to thousands of related phenotypes. However, existing methods for multi-phenotype genome-wide association studies (GWAS) do not model pleiotropy, are only applicable to a small number of phenotypes, or provide no way to perform inference. To add further complication, raw genetic and phenotype data are rarely observed, meaning analyses must be performed on GWAS summary statistics whose statistical properties in high dimensions are poorly understood. We therefore developed a novel model, theoretical framework, and set of methods to perform Bayesian inference in GWAS of high dimensional phenotypes using summary statistics that explicitly model pleiotropy, beget fast computation, and facilitate the use of biologically informed priors. We demonstrate the utility of our procedure by applying it to metabolite GWAS, where we develop new nonparametric priors for genetic effects on metabolite levels that use known metabolic pathway information and foster interpretable inference at the pathway level.
翻译:近年来,遗传和高维生物库和“组学”数据的激增为研究者提供了探究数百至数千个相关表型的共享遗传起源(多效性)的机会。然而,现有的多表型全基因组关联研究(GWAS)方法没有对多效性进行建模,仅适用于少数表型,或者无法进行推断。此外,很少观察到原始遗传和表型数据,这意味着必须在GWAS总结统计数据上进行分析,其统计属性在高维度下尚不清楚。因此,我们开发了一种新的模型、理论框架和一套方法,以使用总结统计数据在基因组关联研究中显式地建模多效性、产生快速的计算并促进使用具有生物学意义的先验知识进行贝叶斯推断。我们通过应用于代谢物GWAS证明了我们的程序的效用,其中我们开发了新的非参数先验,用于代谢物水平上的基因效应,利用已知代谢途径信息并促进在途径水平上的可解释推断。