The rapid growth of online network platforms generates large-scale network data and it poses great challenges for statistical analysis using the spatial autoregression (SAR) model. In this work, we develop a novel distributed estimation and statistical inference framework for the SAR model on a distributed system. We first propose a distributed network least squares approximation (DNLSA) method. In addition, we provide theoretical guarantee of the distributed statistical inference procedure. The theoretical findings and computational advantages are validated by several numerical simulations implemented on the Spark system. %which presents the efficient computational capacity and estimation accuracy of our proposed methods. Lastly, an experiment on the Yelp dataset further illustrates the usefulness of the proposed methodology.
翻译:在线网络平台的迅速增长产生了大规模网络数据,对利用空间自动递减模型进行统计分析提出了巨大挑战。在这项工作中,我们为分布式系统SAR模型开发了一个新的分布式估算和统计推断框架。我们首先提出了分布式网络最小方位近似(DNLSA)方法。此外,我们还为分布式统计推理程序提供了理论保证。理论结论和计算优势通过在SPark系统实施的若干数字模拟得到验证。%这显示了我们拟议方法的有效计算能力和估计准确性。最后,关于Yelp数据集的实验进一步说明了拟议方法的有用性。