Mendelian randomization is the use of genetic variants to assess the existence of a causal relationship between a risk factor and an outcome of interest. Here, we focus on two-sample summary-data Mendelian randomization analyses with many correlated variants from a single gene region, and particularly on cis-Mendelian randomization studies which use protein expression as a risk factor. Such studies must rely on a small, curated set of variants from the studied region; using all variants in the region requires inverting an ill-conditioned genetic correlation matrix and results in numerically unstable causal effect estimates. We review methods for variable selection and estimation in cis-Mendelian randomization with summary-level data, ranging from stepwise pruning and conditional analysis to principal components analysis, factor analysis and Bayesian variable selection. In a simulation study, we show that the various methods have a comparable performance in analyses with large sample sizes and strong genetic instruments. However, when weak instrument bias is suspected, factor analysis and Bayesian variable selection produce more reliable inferences than simple pruning approaches, which are often used in practice. We conclude by examining two case studies, assessing the effects of LDL-cholesterol and serum testosterone on coronary heart disease risk using variants in the HMGCR and SHBG gene regions respectively.
翻译:门德尔随机化是利用基因变体来评估风险因素和感兴趣结果之间因果关系的存在。在这里,我们侧重于由多个单一基因区域相关变体组成的双层摘要-简要数据-门德尔随机化分析,特别是Cis-Mendelian随机化研究,这些研究使用蛋白质表达法作为风险因素。这种研究必须依靠研究区域小规模的、精细的变体;使用该区域的所有变体需要扭转条件不完善的基因相关矩阵,并得出数字上不稳定的因果关系估计结果。我们审查Cis-Mendelian随机化的变量选择和估计方法,并使用摘要数据,从分级和有条件的分析,到主要组成部分分析、要素分析和巴耶斯变异性选择。在模拟研究中,我们显示在分析时,各种方法具有与所研究区域大样本大小和强力遗传仪器的类似性能。然而,如果怀疑仪器偏差,系数分析和Bayesian变量选择会产生比简单的直径直径方法更可靠的推论。这两种方法通常在实践中使用的直径-MIC-CR(HC)和SMIC-R(C-R)区域)风险评估,我们通过两个案例研究,对HDRDRV-C和SMV-R(CR)风险评估结果进行结论。