We consider the problem of identifying intermediate variables (or mediators) that regulate the effect of a treatment on a response variable. While there has been significant research on this classical topic, little work has been done when the set of potential mediators is high-dimensional (HD). A further complication arises when these mediators are interrelated (with unknown dependencies). In particular, we assume that the causal structure of the treatment, the confounders, the potential mediators and the response is a (possibly unknown) directed acyclic graph (DAG). HD DAG models have previously been used for the estimation of causal effects from observational data. In particular, methods called IDA and joint-IDA have been developed for estimating the effects of single and multiple simultaneous interventions, respectively. In this paper, we propose an IDA-type method called MIDA for estimating so-called individual mediation effects from HD observational data. Although IDA and joint-IDA estimators have been shown to be consistent in certain sparse HD settings, their asymptotic properties such as convergence in distribution and inferential tools in such settings have remained unknown. In this paper, we prove HD consistency of MIDA for linear structural equation models with sub-Gaussian errors. More importantly, we derive distributional convergence results for MIDA in similar HD settings, which are applicable to IDA and joint-IDA estimators as well. To our knowledge, these are the first such distributional convergence results facilitating inference for IDA-type estimators. These are built on our novel theoretical results regarding uniform bounds for linear regression estimators over varying subsets of HD covariates which may be of independent interest. Finally, we empirically validate our asymptotic theory for MIDA and demonstrate its usefulness via simulations and a real data application.
翻译:我们考虑的是确定调节对响应变量的处理效果的中间变量(或调解者)的问题。虽然已经对这一古典主题进行了大量研究,但当潜在调解者组群具有高度(HD)时,几乎没有做多少工作。当这些调解者相互关联(不为人所知的相互依存关系)时,又出现了进一步的复杂情况。特别是,我们认为,治疗的因果结构、混淆者、潜在调解者和反应是(可能未知的)引导循环图(DAG)的。过去曾使用HD DAG的理论模型来估计观测数据产生的因果关系。特别是,已经开发了称为IDA和联合开发协会的方法来估计单一和多重同时干预的效果。在本文件中,我们提出了一种称为IMIDA的型方法,用以估计HD观测数据中所谓的个别调解效果。虽然IDA和联合开发协会的因果结构结构结构(MIDA)在一些稀疏松散的情况下被证明是一致的,但它们在分配和在这种环境中的线性工具上的趋同性特性特性,例如,我们在分配过程中的分布和毒理学理论结果,在本文中,我们最后证明,我们通过DA的公式结构模型的数值数据在结构模型中建立了一种不同的数据分配。