In the era of big data, it is necessary to split extremely large data sets across multiple computing nodes and construct estimators using the distributed data. When designing distributed estimators, it is desirable to minimize the amount of communication across the network because transmission between computers is slow in comparison to computations in a single computer. Our work provides a general framework for understanding the behavior of distributed estimation under communication constraints for nonparametric problems. We provide results for a broad class of models, moving beyond the Gaussian framework that dominates the literature. As concrete examples we derive minimax lower and matching upper bounds in the distributed regression, density estimation, classification, Poisson regression and volatility estimation models under communication constraints. To assist with this, we provide sufficient conditions that can be easily verified in all of our examples.
翻译:在大数据时代,有必要将庞大的数据集分成多个计算节点,并利用分布式数据构建估计数据。在设计分布式估计数据时,可取的做法是将整个网络的通信量最小化,因为计算机之间的传输速度慢于单一计算机的计算。我们的工作为了解在通信限制下对非对称问题进行分布性估计的行为提供了一个总体框架。我们为一系列广泛的模型提供了结果,超越了主导文献的高斯框架。作为具体例子,我们在分布式回归、密度估计、分类、 Poisson 回归和波动估计模型中,在分布式回归、密度估计、分类、 Poisson 回归和波动估计模型中得出了最低值,与最高值相匹配。为了协助这一点,我们提供了充分的条件,可以很容易地在全部实例中进行核实。