Polygenic risk scores (PRS) developed from genome-wide association studies (GWAS) are of increasing interest for clinical and research applications. Bayesian methods have been popular for building PRS because of their natural ability to regularize models and incorporate external information. In this article, we present new theoretical results, methods, and extensive numerical studies to advance Bayesian methods for PRS applications. We identify a potential risk, under a common Bayesian PRS framework, of posterior impropriety when integrating the required GWAS summary-statistics and linkage disequilibrium (LD) data from two distinct sources. As a principled remedy to this problem, we propose a projection of the summary statistics data that ensures compatibility between the two sources and in turn a proper behavior of the posterior. We further introduce a new PRS method, with accompanying software package, under the less-explored Bayesian bridge prior to more flexibly model varying sparsity levels in effect size distributions. We extensively benchmark it against alternative Bayesian methods using both synthetic and real datasets, quantifying the impact of both prior specification and LD estimation strategy. Our proposed PRS-Bridge, equipped with the projection technique and flexible prior, demonstrates the most consistent and generally superior performance across a variety of scenarios.
翻译:暂无翻译