Instrumental variable (IV) methods are used to estimate causal effects in settings with unobserved confounding, where we cannot directly experiment on the treatment variable. Instruments are variables which only affect the outcome indirectly via the treatment variable(s). Most IV applications focus on low-dimensional treatments and crucially require at least as many instruments as treatments. This assumption is restrictive: in the natural sciences we often seek to infer causal effects of high-dimensional treatments (e.g., the effect of gene expressions or microbiota on health and disease), but can only run few experiments with a limited number of instruments (e.g., drugs or antibiotics). In such underspecified problems, the full treatment effect is not identifiable in a single experiment even in the linear case. We show that one can still reliably recover the projection of the treatment effect onto the instrumented subspace and develop techniques to consistently combine such partial estimates from different sets of instruments. We then leverage our combined estimators in an algorithm that iteratively proposes the most informative instruments at each round of experimentation to maximize the overall information about the full causal effect.
翻译:仪器变量(IV)方法用来估计在未观察到的混乱环境中的因果关系,我们不能直接试验治疗变量;仪器是只能通过治疗变量间接地影响结果的变量;大多数四种应用侧重于低维治疗,关键需要至少与治疗一样多的仪器。这一假设是限制性的:在自然科学中,我们常常试图推断高维治疗的因果关系(例如,基因表达或微生物对健康和疾病的影响),但只能用数量有限的仪器(例如,药物或抗生素)进行很少的实验;在这种特定的问题中,即使在线性实验中,也无法识别全部治疗效果。我们表明,仍然可以可靠地恢复对仪器子空间上治疗效果的预测,并开发各种技术,将不同仪器的这种局部估计一致结合起来。然后,我们利用我们的综合估计算法,反复提出每轮实验中最丰富的仪器(例如,药物或抗生素),以便最大限度地获得关于全部因果关系的全面信息。