We study sparse linear regression over a network of agents, modeled as an undirected graph (with no centralized node). The estimation problem is formulated as the minimization of the sum of the local LASSO loss functions plus a quadratic penalty of the consensus constraint -- the latter being instrumental to obtain distributed solution methods. While penalty-based consensus methods have been extensively studied in the optimization literature, their statistical and computational guarantees in the high dimensional setting remain unclear. This work provides an answer to this open problem. Our contribution is two-fold. First, we establish statistical consistency of the estimator: under a suitable choice of the penalty parameter, the optimal solution of the penalized problem achieves near optimal minimax rate $\mathcal{O}(s \log d/N)$ in $\ell_2$-loss, where $s$ is the sparsity value, $d$ is the ambient dimension, and $N$ is the total sample size in the network -- this matches centralized sample rates. Second, we show that the proximal-gradient algorithm applied to the penalized problem, which naturally leads to distributed implementations, converges linearly up to a tolerance of the order of the centralized statistical error -- the rate scales as $\mathcal{O}(d)$, revealing an unavoidable speed-accuracy dilemma.Numerical results demonstrate the tightness of the derived sample rate and convergence rate scalings.
翻译:我们研究一个代理商网络的细线性回归,以非方向图(没有中央节点)为模型。估计问题在于尽量减少当地LASSO损失函数的总和,加上协商一致制约的四角罚款 -- -- 后者有助于获得分布式解决方案方法。虽然在优化文献中广泛研究了基于惩罚的协商一致方法,但在高维环境中,它们的统计和计算保障仍然不明确。这项工作为这一开放问题的答案提供了答案。我们的贡献是双重的。首先,我们建立了估计值的统计一致性:在适当选择刑罚参数的情况下,受处罚问题的最佳解决办法达到接近最优的迷你税率$\mathcal{O}(s\log d/N) 和以$_2美元计算损失,其中美元是紧张值值值,美元是环境层面,美元是网络的总样本规模 -- -- 与集中抽样率相符。第二,我们显示对受处罚问题应用的准度比重算法,这自然导致最接近最优的最小的缩缩略度率,即中央统计缩缩缩缩度的缩缩缩缩率。