We study the problem of high-dimensional sparse linear regression in a distributed setting under both computational and communication constraints. Specifically, we consider a star topology network whereby several machines are connected to a fusion center, with whom they can exchange relatively short messages. Each machine holds noisy samples from a linear regression model with the same unknown sparse $d$-dimensional vector of regression coefficients $\theta$. The goal of the fusion center is to estimate the vector $\theta$ and its support using few computations and limited communication at each machine. In this work, we consider distributed algorithms based on Orthogonal Matching Pursuit (OMP) and theoretically study their ability to exactly recover the support of $\theta$. We prove that under certain conditions, even at low signal-to-noise-ratios where individual machines are unable to detect the support of $\theta$, distributed-OMP methods correctly recover it with total communication sublinear in $d$. In addition, we present simulations that illustrate the performance of distributed OMP-based algorithms and show that they perform similarly to more sophisticated and computationally intensive methods, and in some cases even outperform them.
翻译:在计算和通信限制下,我们在分布式环境下研究高维稀薄线性回归问题。 具体地说, 我们考虑星体地形网络, 数台机器与聚变中心连接, 与集成中心交换相对短的信息。 每台机器都持有线性回归模型的噪音样本, 其回归系数为$\theta美元这一未知的稀疏维矢量。 聚变中心的目标是使用少量计算和每台机器有限的通信来估计矢量 $\theta$ 及其支持。 在这项工作中, 我们考虑了基于Orthogonal匹配跟踪( OMP) 的分布式算法( OMP ), 并在理论上研究它们准确恢复$\theta$支持的能力。 我们证明, 在某些条件下, 即使在低信号到n- noise- ratios, 个人机器无法检测到$theta美元支持值, 分配- OMP 方法正确回收了它, 其通信子线以$ 。 此外, 我们提供模拟, 演示了分布式 OMP 匹配算法的性算法的绩效, 更接近于更精细的方法, 。