Mendelian randomization (MR) is a widely-used method to estimate the causal relationship between a risk factor and disease. A fundamental part of any MR analysis is to choose appropriate genetic variants as instrumental variables. Genome-wide association studies often reveal that hundreds of genetic variants may be robustly associated with a risk factor, but in some situations investigators may believe that only a smaller subset of these variants are valid instruments. Nevertheless, using the full set of instruments could lead to biased but more precise estimates, and therefore in terms of mean squared error it may be unclear which set of instruments is optimal. For this purpose, we consider a method for "focused" instrument selection whereby genetic variants are selected to minimise the estimated asymptotic mean squared error of causal effect estimates. In a setting of many weak and locally invalid instruments, we consider a novel strategy to construct confidence intervals for post-selection focused estimators which guards against the worst case loss in asymptotic coverage. In empirical applications to: (i) validate lipid drug targets; and (ii) investigate vitamin D effects on a wide range of outcomes, our findings suggest that the optimal selection of instruments does not involve only a small number of biologically-justified valid instruments, but also many potentially invalid instruments.
翻译:任何MR分析的基本部分是选择适当的基因变异物作为工具变量。 整个基因组协会研究经常发现,数百种基因变异物可能与风险因素有牢固的联系,但在某些情况下,调查人员可能认为,这些变异物中只有一小部分是有效的工具。然而,使用整套手段可能会导致偏差,但更精确的估计,因此,从平均正方误差的角度来看,可能不清楚哪一套工具是最佳的。为此,我们考虑一种“重点”选择工具的方法,选择基因变异物以尽量减少估计的因果估计的随机偏差。在设置许多薄弱和当地无效的工具时,我们考虑采取新的战略,为选后重点估算器建立信任间隔期,以防范无药覆盖方面最严重的案件损失。在实证应用中:(一) 验证脂质药物目标;和(二) 调查维生素D对广泛结果的影响,我们的调查结果表明,最佳选择的生物工具可能并不只涉及无效的文书。</s>