In observational studies, identification of causal effects is threatened by the potential for unmeasured confounding. Negative controls have become widely used to evaluate the presence of potential unmeasured confounding thus enhancing credibility of reported causal effect estimates. Going beyond simply testing for residual confounding, proximal causal inference (PCI) was recently developed to debias causal effect estimates subject to confounding by hidden factors, by leveraging a pair of negative control variables, also known as treatment and outcome confounding proxies. While formal statistical inference has been developed for PCI, these methods can be challenging to implement in practice as they involve solving complex integral equations that are typically ill-posed. In this paper, we develop a regression-based PCI approach, employing a two-stage regression via familiar generalized linear models to implement the PCI framework, which completely obviates the need to solve difficult integral equations. In the first stage, one fits a generalized linear model (GLM) for the outcome confounding proxy in terms of the treatment confounding proxy and the primary treatment. In the second stage, one fits a GLM for the primary outcome in terms of the primary treatment, using the predicted value of the first-stage regression model as a regressor which as we establish accounts for any residual confounding for which the proxies are relevant. The proposed approach has merit in that (i) it is applicable to continuous, count, and binary outcomes cases, making it relevant to a wide range of real-world applications, and (ii) it is easy to implement using off-the-shelf software for GLMs. We establish the statistical properties of regression-based PCI and illustrate their performance in both synthetic and real-world empirical applications.
翻译:暂无翻译