带有将军Gaussian通用的Lasso设计并应用于假设测试的Lasso设计 (The Lasso with general Gaussian designs with applications to hypothesis testing)

The Lasso is a method for high-dimensional regression, which is now commonly used when the number of covariates $p$ is of the same order or larger than the number of observations $n$. Classical asymptotic normality theory does not apply to this model due to two fundamental reasons: $(1)$ The regularized risk is non-smooth; $(2)$ The distance between the estimator $\widehat{\boldsymbol{\theta}}$ and the true parameters vector $\boldsymbol{\theta}^*$ cannot be neglected. As a consequence, standard perturbative arguments that are the traditional basis for asymptotic normality fail. On the other hand, the Lasso estimator can be precisely characterized in the regime in which both $n$ and $p$ are large and $n/p$ is of order one. This characterization was first obtained in the case of Gaussian designs with i.i.d. covariates: here we generalize it to Gaussian correlated designs with non-singular covariance structure. This is expressed in terms of a simpler ``fixed-design'' model. We establish non-asymptotic bounds on the distance between the distribution of various quantities in the two models, which hold uniformly over signals $\boldsymbol{\theta}^*$ in a suitable sparsity class and over values of the regularization parameter. As an application, we study the distribution of the debiased Lasso and show that a degrees-of-freedom correction is necessary for computing valid confidence intervals.

翻译：Lasso 是高维回归的一种方法, 当共差值的美元值与观测值的顺序相同或更高时, 通常使用这一方法。经典无症状常态理论不适用于这个模型, 原因有两个: $(1)美元正常化的风险是非单向的; $(2)美元, 估计器 $\ bloyhat_ boldsymbol_theta $ 美元和真正的参数矢量 $\boldsysylsol_theta ⁇ $ 不能忽略。因此, 标准性扰动参数是常态常态常态常态常态常态的基数。另一方面, Lasso 估计器可以精确地描述于一个制度, 美元和美元都是大的, 美元/ 美元是顺序的。这种定性首先在高斯的设计中以 I. d. d. comblatesality 格式, 我们用非正态性常态的常态常态的常态度度度, 显示我们在两个稳性格式的格式的的的格式的的的的格式的的格式的的格式中, 显示一个简单的。