For the last two decades, high-dimensional data and methods have proliferated throughout the literature. Yet, the classical technique of linear regression has not lost its usefulness in applications. In fact, many high-dimensional estimation techniques can be seen as variable selection that leads to a smaller set of variables (a ``sub-model'') where classical linear regression applies. We analyze linear regression estimators resulting from model-selection by proving estimation error and linear representation bounds uniformly over sets of submodels. Based on deterministic inequalities, our results provide ``good'' rates when applied to both independent and dependent data. These results are useful in meaningfully interpreting the linear regression estimator obtained after exploring and reducing the variables and also in justifying post model-selection inference. All results are derived under no model assumptions and are non-asymptotic in nature.
翻译:在过去二十年中,高维数据和方法在整个文献中扩散。然而,典型的线性回归技术在应用中并没有失去其效用。事实上,许多高维估算技术可以被视为可变选择,从而导致适用经典线性回归的更小的一组变量(“子模型” ) 。我们通过证明估算错误和线性表达方式在一系列子模型上一致的界限来分析模型选择产生的线性回归估计值。基于确定性不平等,我们的结果提供了“良好”比率,既适用于独立数据,也适用于依赖性数据。这些结果有助于有意义地解释在探索和减少变量之后获得的线性回归估计值,也有助于证明后模式选择的推断值的合理性。所有结果都是在没有模型假设的情况下产生的,在性质上是非随机性的。