对因果推断的三重采样 (Heteroscedasticity-aware sample trimming for causal inference)

A popular method for variance reduction in observational causal inference is propensity-based trimming, the practice of removing units with extreme propensities from the sample. This practice has theoretical grounding when the data are homoscedastic and the propensity model is parametric (Yang and Ding, 2018; Crump et al. 2009), but in modern settings where heteroscedastic data are analyzed with non-parametric models, existing theory fails to support current practice. In this work, we address this challenge by developing new methods and theory for sample trimming. Our contributions are three-fold: first, we describe novel procedures for selecting which units to trim. Our procedures differ from previous work in that we trim not only units with small propensities, but also units with extreme conditional variances. Second, we give new theoretical guarantees for inference after trimming. In particular, we show how to perform inference on the trimmed subpopulation without requiring that our regressions converge at parametric rates. Instead, we make only fourth-root rate assumptions like those in the double machine learning literature. This result applies to conventional propensity-based trimming as well and thus may be of independent interest. Finally, we propose a bootstrap-based method for constructing simultaneously valid confidence intervals for multiple trimmed sub-populations, which are valuable for navigating the trade-off between sample size and variance reduction inherent in trimming. We validate our methods in simulation, on the 2007-2008 National Health and Nutrition Examination Survey, and on a semi-synthetic Medicare dataset and find promising results in all settings.

翻译：减少观测因果推断差异的流行方法,是基于偏差的三角法,即将具有极端倾向的单位从抽样中剔除的做法。当数据具有同质性时,这种做法具有理论依据,而偏差模型则是参数性(Yang和Ding,2018年;Crump等人,2009年),但在现代环境下,以非参数模型分析偏差性数据时,现有理论不能支持当前的做法。在这项工作中,我们通过为抽样的三角法制定新方法和理论来应对这一挑战。我们的贡献有三重:首先,我们描述选择哪些单位进行修整的新程序。我们的程序不同于以往的工作,因为我们不仅修整有小偏差的单位,而且还修整有极端条件性差异的单位。第二,我们提供了新的理论保证,用非参数性模型性数据性数据性分析,特别是我们如何在三重子人口群中进行推断,而不需要我们的回归以等值率为中心。相反,我们所做的四重比率假设与双机学习文献中的数据性定序的定序结构,最后,我们提出的是常规的三重度计算方法。