The rapid emergence of massive datasets in various fields poses a serious challenge to traditional statistical methods. Meanwhile, it provides opportunities for researchers to develop novel algorithms. Inspired by the idea of divide-and-conquer, various distributed frameworks for statistical estimation and inference have been proposed. They were developed to deal with large-scale statistical optimization problems. This paper aims to provide a comprehensive review for related literature. It includes parametric models, nonparametric models, and other frequently used models. Their key ideas and theoretical properties are summarized. The trade-off between communication cost and estimate precision together with other concerns are discussed.
翻译:近年来,各领域出现了大规模数据集,这给传统的统计方法带来了严峻的挑战,但同时也提供了开发新算法的机会。分而治之的思想激发了研究人员开发出用于解决大规模统计优化问题的各种分布式框架。本文旨在对相关文献进行全面的综述。综述包括参数模型、非参数模型以及其他常用模型。总结了各个模型的关键思想和理论性质。讨论了通信成本和估计精度之间的平衡以及其他相关问题。