We consider a class of statistical estimation problems in which we are given a random data matrix ${\boldsymbol X}\in {\mathbb R}^{n\times d}$ (and possibly some labels ${\boldsymbol y}\in{\mathbb R}^n$) and would like to estimate a coefficient vector ${\boldsymbol \theta}\in{\mathbb R}^d$ (or possibly a constant number of such vectors). Special cases include low-rank matrix estimation and regularized estimation in generalized linear models (e.g., sparse regression). First order methods proceed by iteratively multiplying current estimates by ${\boldsymbol X}$ or its transpose. Examples include gradient descent or its accelerated variants. Celentano, Montanari, Wu proved that for any constant number of iterations (matrix vector multiplications), the optimal first order algorithm is a specific approximate message passing algorithm (known as `Bayes AMP'). The error of this estimator can be characterized in the high-dimensional asymptotics $n,d\to\infty$, $n/d\to\delta$, and provides a lower bound to the estimation error of any first order algorithm. Here we present a simpler proof of the same result, and generalize it to broader classes of data distributions and of first order algorithms, including algorithms with non-separable nonlinearities. Most importantly, the new proof technique does not require to construct an equivalent tree-structured estimation problem, and is therefore susceptible of a broader range of applications.
翻译:我们考虑的是一组统计估计问题,在这些问题中,我们得到一个随机的数据矩阵 $[boldsymbol X ⁇ in {mathbrb R ⁇ n\time d}$(可能有些标签 $_boldsymbol y ⁇ in_mathbb R ⁇ n$ 美元),我们想估算一个系数矢量 $(boldsymbol\theta ⁇ in}in\mathbrb R ⁇ d$ (或可能此类矢量的不变数) 。 特殊案例包括低级别矩阵估算和在通用线性模型中进行定期估算(例如,微缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩略图) 。 首级排序方法可以反复地将当前估算结果乘以 $n, d=xxxxxxxxxxxxxxxxxxxxxxxxxxxx