This paper considers a multiple environments linear regression model in which data from multiple experimental settings are collected. The joint distribution of the response variable and covariate may vary across different environments, yet the conditional expectation of $y$ given the unknown set of important variables are invariant across environments. Such a statistical model is related to the problem of endogeneity, causal inference, and transfer learning. The motivation behind it is illustrated by how the goals of prediction and attribution are inherent in estimating the true parameter and the important variable set. We construct a novel {\it environment invariant linear least squares (EILLS)} objective function, a multiple-environment version of linear least squares that leverages the above conditional expectation invariance structure and heterogeneity among different environments to determine the true parameter. Our proposed method is applicable without any additional structural knowledge and can identify the true parameter under a near-minimal identification condition. We establish non-asymptotic $\ell_2$ error bounds on the estimation error for the EILLS estimator in the presence of spurious variables. Moreover, we further show that the EILLS estimator is able to eliminate all endogenous variables and the $\ell_0$ penalized EILLS estimator can achieve variable selection consistency in high-dimensional regimes. These non-asymptotic results demonstrate the sample efficiency of the EILLS estimator and its capability to circumvent the curse of endogeneity in an algorithmic manner without any prior structural knowledge.
翻译:本文考虑了多个环境线性回归模型, 收集来自多个实验设置的数据。 反应变量和共变的组合环境在不同的环境中可能不同, 反应变量和共变之间的联合环境分布可能不同, 然而, 在未知的重要变量组中, 美元这一有条件的预期值在不同的环境中是变化不定的。 这种统计模型与内源性、 因果推断和转移学习等问题相关。 它背后的动机体现在预测和归因目标是如何在估计真实参数和重要变量集中固有的。 我们构建了一个新颖的反向环境, 以变化性线性最小正方( EILLLS) 目标功能为新颖的反向环境。 我们进一步显示, 线性最低结构结构的多种环境版本, 利用上述条件的不易变异性结构结构结构以及不同环境之间的异性来决定真实参数。 我们所提议的方法在接近最小的识别条件下可以适用真实参数。 我们为ELLS估算值和高度递增度试算法性( EILS), 我们进一步展示了 EILS 之前的常值选择系统不具有稳定性, 。</s>