We discuss causal inference for observational studies with possibly invalid instrumental variables. We propose a novel methodology called two-stage curvature identification (TSCI), which explores the nonlinear treatment model with machine learning and adjusts for different forms of violating the instrumental variable assumptions. The success of TSCI requires the instrumental variable's effect on treatment to differ from its violation form. A novel bias correction step is implemented to remove bias resulting from potentially high complexity of machine learning. Our proposed TSCI estimator is shown to be asymptotically unbiased and normal even if the machine learning algorithm does not consistently estimate the treatment model. We design a data-dependent method to choose the best among several candidate violation forms. We apply TSCI to study the effect of education on earnings.
翻译:我们讨论可能存在无效工具变量的观察研究的因果推断。我们提出一种新的方法称为两阶段曲率识别(TSCI),它利用机器学习探索非线性治疗模型,并调整不同形式的违反工具变量假设。TSCI的成功需要工具变量对治疗的影响与其违规形式不同。实施一种新颖的偏差校正步骤,以消除由于机器学习的潜在高复杂性而产生的偏差。即使机器学习算法未能一致地估计治疗模型,我们提出的TSCI估计器也被证明是渐进无偏且正常的。我们设计了一种数据相关的方法来选择多个候选违规形式中的最佳形式。我们将TSCI应用于研究教育对收入的影响。