We study the problem of learning conditional average treatment effects (CATE) from high-dimensional, observational data with unobserved confounders. Unobserved confounders introduce ignorance -- a level of unidentifiability -- about an individual's response to treatment by inducing bias in CATE estimates. We present a new parametric interval estimator suited for high-dimensional data, that estimates a range of possible CATE values when given a predefined bound on the level of hidden confounding. Further, previous interval estimators do not account for ignorance about the CATE stemming from samples that may be underrepresented in the original study, or samples that violate the overlap assumption. Our novel interval estimator also incorporates model uncertainty so that practitioners can be made aware of out-of-distribution data. We prove that our estimator converges to tight bounds on CATE when there may be unobserved confounding, and assess it using semi-synthetic, high-dimensional datasets.
翻译:我们研究了从高维观测数据中学习条件平均治疗效果(CATE)的问题。没有观测到的困惑者对一个人的治疗反应产生无知 -- -- 一种不可辨识的程度 -- -- 通过在CATE估计中产生偏差来引入个人对治疗的反应。我们提出了一个适合高维数据的新的参数间距估计器,在给于隐藏混结程度预先界定的界限时估计出一系列可能的CATE值。此外,以前的间距估计器没有考虑到对原研究中可能代表不足的样品或违反重叠假设的样品产生的CATE的无知。我们的新的时间估计器还包含了模型不确定性,以便让从业人员了解分配之外的数据。我们证明我们的估计器在可能无法观测到粘结时会与CATE的紧紧界限相交汇,并且使用半合成的高维数据集来评估它。