We study the distribution of the maximum likelihood estimate (MLE) in high-dimensional logistic models, extending the recent results from Sur (2019) to the case where the Gaussian covariates may have an arbitrary covariance structure. We prove that in the limit of large problems holding the ratio between the number $p$ of covariates and the sample size $n$ constant, every finite list of MLE coordinates follows a multivariate normal distribution. Concretely, the $j$th coordinate $\hat {\beta}_j$ of the MLE is asymptotically normally distributed with mean $\alpha_\star \beta_j$ and standard deviation $\sigma_\star/\tau_j$; here, $\beta_j$ is the value of the true regression coefficient, and $\tau_j$ the standard deviation of the $j$th predictor conditional on all the others. The numerical parameters $\alpha_\star > 1$ and $\sigma_\star$ only depend upon the problem dimensionality $p/n$ and the overall signal strength, and can be accurately estimated. Our results imply that the MLE's magnitude is biased upwards and that the MLE's standard deviation is greater than that predicted by classical theory. We present a series of experiments on simulated and real data showing excellent agreement with the theory.
翻译:我们研究高层次后勤模型中最大概率估计(MLE)的分布情况,将最近从苏尔(2019年)得出的结果扩大到高山共变体可能具有任意共变结构的情况。我们证明,在维持共变数的美元数和样本大小的不变值之间的比例的巨大问题的限度内,每份MLE坐标的有限清单都遵循多变正常分布。具体地说,MLE中美元对美元(hat $)的美元(beta)和美元(gmastar)的数值协调起来,通常以美元/美元和标准差差(sigestar/\beta_j$);这里,美元(beta_j)是真实回归系数值和美元($tau_j$)之间的标准差值,取决于问题维度($p/n$)和美元(gma_star)通常以美元为单位,而标准差值的标准差值通常取决于问题,而总体信号性差值则根据我们目前的标准理论的精确地估计,我们的标准测测测测测得了M的数值。