Motivated by distributed machine learning settings such as Federated Learning, we consider the problem of fitting a statistical model across a distributed collection of heterogeneous data sets whose similarity structure is encoded by a graph topology. Precisely, we analyse the case where each node is associated with fitting a sparse linear model, and edges join two nodes if the difference of their solutions is also sparse. We propose a method based on Basis Pursuit Denoising with a total variation penalty, and provide finite sample guarantees for sub-Gaussian design matrices. Taking the root of the tree as a reference node, we show that if the sparsity of the differences across nodes is smaller than the sparsity at the root, then recovery is successful with fewer samples than by solving the problems independently, or by using methods that rely on a large overlap in the signal supports, such as the group Lasso. We consider both the noiseless and noisy setting, and numerically investigate the performance of distributed methods based on Distributed Alternating Direction Methods of Multipliers (ADMM) and hyperspectral unmixing.
翻译:基于分布式的机器学习环境(如Federation Learning),我们考虑在分布式的分布式不同数据集集中安装统计模型的问题,这些数据集的相似性结构由图表地形学编码。确切地说,我们分析每个节点与稀薄线性模型的安装有关的情况,如果其解决方案的差异也很小,边缘会结合两个节点。我们提出了一个基于 " 基础追求疏松 " 的方法,并规定了完全不同的处罚,并为亚加西的设计矩阵提供有限的样本保障。我们以树根作为参考节点,我们表明如果各节点差异的广度小于根部的松散度,那么,那么,如果利用比独立解决问题少的样本,或者通过使用依赖信号支持中大量重叠的方法(如Lasso组),则恢复成功。我们既考虑无噪音又吵闹的设置,也从数字上调查基于多相拨者分布式方向法(ADMMM)和超光谱解的分布式方法的绩效。