The method of instrumental variables provides a fundamental and practical tool for causal inference in many empirical studies where unmeasured confounding between the treatments and the outcome is present. Modern data such as the genetical genomics data from these studies are often high-dimensional. The high-dimensional linear instrumental-variables regression has been considered in the literature due to its simplicity albeit a true nonlinear relationship may exist. We propose a more data-driven approach by considering the nonparametric additive models between the instruments and the treatments while keeping a linear model between the treatments and the outcome so that the coefficients therein can directly bear causal interpretation. We provide a two-stage framework for estimation and inference under this more general setup. The group lasso regularization is first employed to select optimal instruments from the high-dimensional additive models, and the outcome variable is then regressed on the fitted values from the additive models to identify and estimate important treatment effects. We provide non-asymptotic analysis of the estimation error of the proposed estimator. A debiasing procedure is further employed to yield valid inference. Extensive numerical experiments show that our method can rival or outperform existing approaches in the literature. We finally analyze the mouse obesity data and discuss new findings from our method.
翻译:工具变量的方法为许多实验性研究中的因果推断提供了基本和实用的工具,这些实验性研究中,处理方法与结果之间有未经测量的混杂现象和结果之间的因果关系。现代数据,如这些研究的遗传基因组数据往往是高维的。高维线性工具变量回归在文献中得到了考虑,因为其简单,尽管可能存在真正的非线性关系。我们建议一种更注重数据的方法,即考虑仪器和处理方法之间的非参数添加模型,同时在处理方法与结果之间保持一个线性模型,以便其中的系数能够直接产生因果关系解释。我们提供了两个阶段的估算和推断框架,在这种更一般性的设置下,我们提供了两个阶段的估算和推断框架。Lasso组的正规化首先用于从高维性添加模型中选择最佳工具,结果变量随后又在添加模型的固定值上进行回归,以便确定和估计重要的治疗效果。我们提出了一种更精确的估算错误。我们进一步采用了一种偏差程序,以得出正确的误差。我们从目前的数据分析方法,我们从现有模型中最终分析了对应的方法。