Effect modification occurs when the effect of the treatment on an outcome varies according to the level of other covariates and often has important implications in decision making. When there are tens or hundreds of covariates, it becomes necessary to use the observed data to select a simpler model for effect modification and then make valid statistical inference. We propose a two stage procedure to solve this problem. First, we use Robinson's transformation to decouple the nuisance parameters from the treatment effect of interest and use machine learning algorithms to estimate the nuisance parameters. Next, after plugging in the estimates of the nuisance parameters, we use the Lasso to choose a low-complexity model for effect modification. Compared to a full model consisting of all the covariates, the selected model is much more interpretable. Compared to the univariate subgroup analyses, the selected model greatly reduces the number of false discoveries. We show that the conditional selective inference for the selected model is asymptotically valid given the rate assumptions in classical semiparametric regression. Extensive simulation studies are conducted to verify the asymptotic results and an epidemiological application is used to demonstrate the method.
翻译:当治疗对结果的影响因其他共变程度不同而不同,并且往往在决策中产生重要影响时,才会发生效果修改。当有几十或数百种共变时,就有必要使用观察到的数据来选择一个更简单的模型来修改效果,然后作出有效的统计推论。我们建议了两个阶段的程序来解决这个问题。首先,我们使用鲁滨逊的变换法来调和从利息的处理效果中产生的麻烦参数,并使用机器学习算法来估计扰动参数。接着,在插入扰动参数的估计数之后,我们使用激光索来选择低兼容性模型来修改效果。与由所有共变模型组成的完整模型相比,所选模型更容易解释。与单向分组分析相比,所选模型大大减少了虚假发现的数量。我们表明,根据古典半对回归的率假设,对选定模型的有条件的选择性推论是无效的。进行了广泛的模拟研究,以核实非共变结果,并用流行病学应用了方法来证明。