Understanding treatment effect heterogeneity is important for decision making in medical and clinical practices, or handling various engineering and marketing challenges. When dealing with high-dimensional covariates or when the effect modifiers are not predefined and need to be discovered, data-adaptive selection approaches become essential. However, with data-driven model selection, the quantification of statistical uncertainty is complicated by post-selection inference due to difficulties in approximating the sampling distribution of the target estimator. Data-driven model selection tends to favor models with strong effect modifiers with an associated cost of inflated type I errors. Although several frameworks and methods for valid statistical inference have been proposed for ordinary least squares regression following data-driven model selection, fewer options exist for valid inference for effect modifier discovery in causal modeling contexts. In this article, we extend two different methods to develop valid inference for penalized G-estimation that investigates effect modification of proximal treatment effects within the structural nested mean model framework. We show the asymptotic validity of the proposed methods. Using extensive simulation studies, we evaluate and compare the finite sample performance of the proposed methods and the naive inference based on a sandwich variance estimator. Our work is motivated by the study of hemodiafiltration for treating patients with end-stage renal disease at the Centre Hospitalier de l'Universit\'e de Montr\'eal. We apply these methods to draw inference about the effect heterogeneity of dialysis facility on the repeated session-specific hemodiafiltration outcomes.
翻译:暂无翻译