Large-scale genome-wide association studies (GWAS) have offered an exciting opportunity to discover putative causal genes or risk factors associated with diseases by using SNPs as instrumental variables (IVs). However, conventional approaches assume linear causal relations partly for simplicity and partly for the only availability of GWAS summary data. In this work, we propose a novel model {for transcriptome-wide association studies (TWAS)} to incorporate nonlinear relationships across IVs, an exposure, and an outcome, which is robust against violations of the valid IV assumptions and permits the use of GWAS summary data. We decouple the estimation of a marginal causal effect and a nonlinear transformation, where the former is estimated via sliced inverse regression and a sparse instrumental variable regression, and the latter is estimated by a ratio-adjusted inverse regression. On this ground, we propose an inferential procedure. An application of the proposed method to the ADNI gene expression data and the IGAP GWAS summary data identifies 18 causal genes associated with Alzheimer's disease, including APOE and TOMM40, in addition to 7 other genes missed by two-stage least squares considering only linear relationships. Our findings suggest that nonlinear modeling is required to unleash the power of IV regression for identifying potentially nonlinear gene-trait associations. Accompanying this paper is our Python library nl-causal(https://github.com/nl-causal/nonlinear-causal) that implements the proposed method.
翻译:大规模全基因组协会研究(GWAS)为发现与疾病相关的推定因果基因或风险因素提供了一个令人振奋的机会,利用SNPs作为工具变量(IVs) 。然而,常规方法假定线性因果关系,部分是为了简单,部分是为了提供GWAS摘要数据。在这项工作中,我们提议了一个新颖的模型{用于全基因组协会研究(TWAS)},以纳入跨四类的非线性关系、一种接触和结果(IGAPGAS摘要数据在防止违反有效的四类假设的情况下十分活跃,并允许使用GWAS摘要数据。我们分解了对边性因果关系效应和非线性转变的估计,前者通过分截反向回归和稀少的辅助变量回归来估计,后者则通过反向回归比率来估计。在这里,我们提议了一个推断程序。拟议方法应用于ADNI基因表达数据和IGAPE/GWAS摘要数据,确定了与阿尔茨海默氏病相关的18种因果基因,包括APOE和TOM40,此外,除了7个直线型的DNA关系,而建议采用我们的亚级结构分析法系分析法系的亚,以显示我们至少两级的亚级的亚级的基系的基系研究。