改进仪器变量估计器的后分层方法 (Improving instrumental variable estimators with post-stratification)

An instrumental variable (IV) is a device that encourages units in a study to be exposed to a treatment. Under a set of key assumptions, a valid instrument allows for consistent estimation of treatment effects for compliers (those who are only exposed to treatment when encouraged to do so) even in the presence of unobserved confounders. Unfortunately, popular IV estimators can be unstable in studies with a small fraction of compliers. Here, we explore post-stratifying the data using variables that predict complier status (and, potentially, the outcome) to yield better estimation and inferential properties. We outline an estimator that is a weighted average of IV estimates within each stratum, weighing the stratum estimates by their estimated proportion of compliers. We then explore the benefits of post-stratification in terms of bias reduction, variance reduction, and improved standard error estimates, providing derivations that identify the direction of bias as a function of the relative means of the compliers and non-compliers. We also provide a finite-sample asymptotic formula for the variance of the post-stratified estimators. We demonstrate the relative performances of different IV approaches in simulations studies and discuss the advantages of our design-based post-stratification approach over incorporating compliance-predictive covariates into two-stage least squares regressions. In the end, we show covariates predictive of outcome can increase precision, but only if one is willing to make a bias-variance trade-off by down-weighting or dropping those strata with few compliers. Our methods are further exemplified in an application.

翻译：仪器变量（IV）是一种在研究中鼓励单位接受治疗的设备。在一组关键假设下，有效的仪器允许在存在未观察到的混淆因素的情况下一致地估计只有在被鼓励治疗时才接受治疗的参与者的治疗效果。不幸的是，流行的IV估计器在只有少数参与者的研究中可能不稳定。在这里，我们探讨了使用可以预测参与者身份（以及潜在的结果）的变量对数据进行后分层，以获得更好的估计和推断性质。我们概述了一种估计器，即在每个层中IV估计的加权平均，用其估计的参与者比例加权层估计。然后，我们探讨了后分层的好处，包括偏差降低、方差降低和改进的标准误估计，并提供了导出，说明偏差方向是参与者和非参与者平均值之间关系的函数。我们还提供了后分层估计器方差的有限样本渐近公式。我们在模拟研究中展示了不同IV方法的相对性能，并讨论了我们的基于设计的后分层方法与将可预测性协变因素纳入两阶段最小二乘回归的方法相比的优势。最后，我们展示了预测结果的协变因素可以增加精度，但只有在通过降低或删除少数参与者的层来进行偏差-方差权衡的情况下才能实现。我们的方法在一个应用中进一步说明。