We address applied and computational issues for the problem of multiple treatment effect inference under many potential confounders. While there is abundant literature on the harmful effects of omitting relevant controls (under-selection), we show that over-selection can be comparably problematic, introducing substantial variance and a bias related to the non-random over-inclusion controls. We provide a novel empirical Bayes framework to mitigate both under-selection problems in standard high-dimensional methods and over-selection issues in recent proposals, by learning whether each control's inclusion should be encouraged or discouraged. We develop efficient gradient-based and Expectation-Propagation model-fitting algorithms to render the approach practical for a wide class of models. A motivating problem is to estimate the salary gap evolution in recent years in relation to potentially discriminatory characteristics such as gender, race, ethnicity and place of birth. We found that, albeit smaller, some wage differences remained for female and black individuals. A similar trend is observed when analyzing the joint contribution of these factors to deviations from the average salary. Our methodology also showed remarkable estimation robustness to the addition of spurious artificial controls, relative to existing proposals.
翻译:我们在许多潜在的混淆分子中处理多重治疗效应推断问题的应用和计算问题。虽然有大量文献说明忽略相关控制(选择不足)的有害影响,但我们表明,选择过度可能造成相当大问题,造成与非随机过度包容控制有关的重大差异和偏见。我们在最近的提案中提供了一个新的经验性海湾框架,以缓解标准高维方法和选择过多问题中的选择不足问题,方法是了解是否应该鼓励或劝阻每项控制都包括在内。我们开发了高效的梯度和期望促进模型配置算法,使这一方法对广泛的模型具有实用性。一个诱因问题是如何估计近年来工资差距的变化与性别、种族、族裔和出生地等潜在的歧视性特征有关。我们发现,尽管女性和黑人的工资差异较小,但在分析这些因素对偏离平均工资的共同作用时,也观察到了类似的趋势。我们的方法还表明,在对现有建议增加虚假的人工控制之外,还存在明显的估计力。