A basic principle in the design of observational studies is to approximate the randomized experiment that would have been conducted under controlled circumstances. Now, linear regression models are commonly used to analyze observational data and estimate causal effects. How do linear regression adjustments in observational studies emulate key features of randomized experiments, such as covariate balance, self-weighted sampling, and study representativeness? In this paper, we provide answers to this and related questions by analyzing the implied (individual-level data) weights of linear regression methods. We derive new closed-form expressions of the weights and examine their properties in both finite and asymptotic regimes. We show that the implied weights of general regression problems can be equivalently obtained by solving a convex optimization problem. Among others, we study doubly and multiply robust properties of regression estimators from the perspective of their implied weights. This equivalence allows us to bridge ideas from the regression modeling and causal inference literatures. As a result, we propose novel regression diagnostics for causal inference that are part of the design stage of an observational study. As special cases, we analyze the implied weights in common settings such as multi-valued treatments and regression adjustment after matching. We implement the weights and diagnostics in the new lmw package for R.
翻译:在设计观察研究时,一个基本原则是接近在受控制情况下本会进行的随机实验。现在,通常使用线性回归模型来分析观察数据和估计因果关系。观察研究中的线性回归调整如何仿效随机实验的关键特征,例如共变平衡、自加权抽样和代表性研究?在本文件中,我们通过分析线性回归方法隐含的(个人一级数据)重量和因果关系推断文献来回答这一问题和相关问题。我们从有限的和无药可治的制度中得出新的加权封闭式表达,并检查其特性。我们表明,一般回归问题隐含的重量可以通过解决一个凝固优化问题来等同地获得。我们除其他之外,我们从隐含的重量的角度研究回归估计器的双重和倍增强特性。这种等等同使我们能够将各种想法与回归模型的隐含重量和因果关系推断文献联系起来。结果是,我们提出了新的回归分析结果,这是观察研究设计阶段的一部分。作为特殊情况,我们分析了共同的回归分析模型的比重,我们用在多重情况下,在共同的模型中进行了隐含的比重分析。