When the data are stored in a distributed manner, direct application of traditional statistical inference procedures is often prohibitive due to communication cost and privacy concerns. This paper develops and investigates two Communication-Efficient Accurate Statistical Estimators (CEASE), implemented through iterative algorithms for distributed optimization. In each iteration, node machines carry out computation in parallel and communicate with the central processor, which then broadcasts aggregated information to node machines for new updates. The algorithms adapt to the similarity among loss functions on node machines, and converge rapidly when each node machine has large enough sample size. Moreover, they do not require good initialization and enjoy linear converge guarantees under general conditions. The contraction rate of optimization errors is presented explicitly, with dependence on the local sample size unveiled. In addition, the improved statistical accuracy per iteration is derived. By regarding the proposed method as a multi-step statistical estimator, we show that statistical efficiency can be achieved in finite steps in typical statistical applications. In addition, we give the conditions under which the one-step CEASE estimator is statistically efficient. Extensive numerical experiments on both synthetic and real data validate the theoretical results and demonstrate the superior performance of our algorithms.
翻译:当数据以分布方式存储时,由于通信成本和隐私问题,直接应用传统统计推断程序往往令人望而却步,因为通信成本和隐私问题,直接应用传统统计推断程序往往令人望而却步;本文件开发和调查了两个通信效率高的准确统计模拟器(CEASE),通过迭代算法实施,进行分配优化;在每个迭代中,节点机进行平行计算,并与中央处理器通信,然后将汇总信息传送给节点机进行新更新;这些算法适应节点机损失功能之间的相似性,并在每个节点机具有足够大的样本大小时迅速汇合;此外,它们不需要良好的初始化,在一般条件下享有线性趋同保证;优化误差的缩缩速率得到明确表述,取决于当地抽样大小;此外,每个迭代法的统计准确度得到提高;关于拟议的方法作为多步统计估计器,我们表明统计效率可以在典型的统计应用中以有限的步骤实现。此外,我们还给每个节点的CASEEAE估测算器提供条件,使一步式的测算结果在统计上都有效。