Given an observational study with $n$ independent but heterogeneous units and one $p$-dimensional sample per unit containing covariates, interventions, and outcomes, our goal is to learn the counterfactual distribution for each unit. We consider studies with unobserved confounding which introduces statistical biases between interventions and outcomes as well as exacerbates the heterogeneity across units. Modeling the underlying joint distribution as an exponential family and under suitable conditions, we reduce learning the $n$ unit-level counterfactual distributions to learning $n$ exponential family distributions with heterogeneous parameters and only one sample per distribution. We introduce a convex objective that pools all $n$ samples to jointly learn all $n$ parameters and provide a unit-wise mean squared error bound that scales linearly with the metric entropy of the parameter space. For example, when the parameters are $s$-sparse linear combination of $k$ known vectors, the error is $O(s\log k/p)$. En route, we derive sufficient conditions for compactly supported distributions to satisfy the logarithmic Sobolev inequality.
翻译:根据一项以美元独立但各异单位和每单位一个美元维度样本包含共变、干预和结果的观测研究,我们的目标是学习每个单位的反事实分布。我们考虑的是,研究中未观察到的混淆,在干预和结果之间引入统计偏差,并加剧各个单位的异质性。以指数式组合和在适当条件下模拟基本联合分布,我们减少学习单位级反事实分布,以学习具有异差参数和每个分布只有一个样本的单位的美元指数家庭分布。我们引入了一个配置目标,将所有样本集合在一起,共同学习所有单位的美元参数,并提供一个以单位为单位的平均正方形错误,以参数空间的公吨为线性标定,例如,当参数是美元-偏差的直线组合,已知矢量为美元,误差为$O(slog k/p).。在路径上,我们为压缩支持的分布提供了充分的条件,以满足对数索博列值不平等。