Given an observational study with $n$ independent but heterogeneous units, our goal is to learn the counterfactual distribution for each unit using only one $p$-dimensional sample per unit containing covariates, interventions, and outcomes. Specifically, we allow for unobserved confounding that introduces statistical biases between interventions and outcomes as well as exacerbates the heterogeneity across units. Modeling the underlying joint distribution as an exponential family, we reduce learning the unit-level counterfactual distributions to learning $n$ exponential family distributions with heterogeneous parameters and only one sample per distribution. We introduce a convex objective that pools all $n$ samples to jointly learn all $n$ parameter vectors, and provide a unit-wise mean squared error bound that scales linearly with the metric entropy of the parameter space. For example, when the parameters are $s$-sparse linear combination of $k$ known vectors, the error is $O(s\log k/p)$. En route, we derive sufficient conditions for compactly supported distributions to satisfy the logarithmic Sobolev inequality. As an application of the framework, our results enable consistent imputation of sparsely missing covariates.
翻译:根据一项以美元独立但各异单位进行的观测研究,我们的目标是学习每个单位的反事实分布,每个单位只使用含有共差、干预和结果的单位一美元维样本。 具体地说, 我们允许未观察到的混乱, 引入干预和结果之间的统计偏差, 并加剧各个单位的异质性。 将基本联合分布模拟成指数式家族, 我们减少学习单位级反事实分布, 学习单位级反事实分布, 学习具有异差参数的指数家庭分布, 并且每个分布只一个样本。 我们引入一个正方形目标, 将所有样本集合起来, 共同学习所有美元参数矢量, 并提供一个以单位为基础的平均正方形错误, 以线性标定尺度与参数空间的公吨连接。 例如, 当参数是美元偏差的直线性组合( $k) 时, 我们的误差是 $O ( slog k/ p) 。 在路线上, 我们为压缩支持的分布提供了足够的条件, 以便满足对数索波列尔夫不平等。 作为框架的应用, 我们的结果能够稳定地实现 。