We consider the problem of sparse normal means estimation in a distributed setting with communication constraints. We assume there are $M$ machines, each holding $d$-dimensional observations of a $K$-sparse vector $\mu$ corrupted by additive Gaussian noise. The $M$ machines are connected in a star topology to a fusion center, whose goal is to estimate the vector $\mu$ with a low communication budget. Previous works have shown that to achieve the centralized minimax rate for the $\ell_2$ risk, the total communication must be high - at least linear in the dimension $d$. This phenomenon occurs, however, at very weak signals. We show that at signal-to-noise ratios (SNRs) that are sufficiently high - but not enough for recovery by any individual machine - the support of $\mu$ can be correctly recovered with significantly less communication. Specifically, we present two algorithms for distributed estimation of a sparse mean vector corrupted by either Gaussian or sub-Gaussian noise. We then prove that above certain SNR thresholds, with high probability, these algorithms recover the correct support with total communication that is sublinear in the dimension $d$. Furthermore, the communication decreases exponentially as a function of signal strength. If in addition $KM\ll \tfrac{d}{\log d}$, then with an additional round of sublinear communication, our algorithms achieve the centralized rate for the $\ell_2$ risk. Finally, we present simulations that illustrate the performance of our algorithms in different parameter regimes.
翻译:我们考虑的是在分布式环境下在通信限制下对正常值进行稀少估算的问题。 我们假设,在分布式环境下,有一台机器,每台机器都持有价值为1K$的扭曲矢量 $\mu$,但被添加高斯噪音腐蚀了。 美元机器在一个星表层中连接到一个聚变中心,该中心的目标是以低通信预算来估计矢量 $\mu$。 先前的工程显示,要达到集中式微麦克斯率($\ell_2美元的风险),通信总量必须是高的----至少在维度上是线性值 $。 然而,这种现象是在极弱的逻辑信号下出现。 我们显示,在信号到音比比率(SNR)高,但不足以让任何单个机器进行恢复。 美元的支持可以通过大大降低通信预算。 具体地说,我们用两种算法来分配一种分散式的最小矢量矢量矢量的矢量矢量的矢量的矢量的矢量的矢量。 我们随后证明,这些算算法以高概率高于某些SNR的临界值值值值值值值, 这些运算算算算法将恢复我们目前的通信总基值的准确度的信号值。