The analysis of network data has gained considerable interest in the recent years. This also includes the analysis of large, high dimensional networks with hundreds and thousands of nodes. While Exponential Random Graph Models (ERGMs) serve as workhorse for network data analyses, their applicability to very large networks is problematic via classical inference such as maximum likelihood or fully Bayesian estimation due to scaling and instability issues. The latter trace from the fact, that classical network statistics consider nodes as exchangeable, i.e. actors in the network are assumed to be homogeneous. This is often questionable and one way to circumvent the restrictive assumption is to include actor specific random effects which account for unobservable heterogeneity. This in turn however increases the number of unknowns considerably making the model highly-parameterized. As a solution even for very large networks we propose a scalable approach based on variational approximations, which not only leads to numerically stable estimation but is also applicable to high-dimensional directed as well as undirected networks. We furthermore show that including node specific covariates can reduce node heterogeneity which we facilitate through versatile prior formulations. We demonstrate the procedure in two complex examples, namely Facebook data and data from an international arms trading network.
翻译:对网络数据的分析近年来引起了相当大的兴趣,这还包括对拥有数百个和数千个节点的大型、高维网络进行了分析。虽然在网络数据分析中作为工作马,但通过传统推论,例如由于规模扩大和不稳定问题,最大可能性或完全巴伊西亚估算等典型推论,这些模型对大型网络的可适用性很成问题。从这一事实可以看出,古典网络统计认为节点是可以互换的,即网络中的行为者被假定为是同质的。这往往是值得怀疑的,绕过限制性假设的一种方式是包括说明不可观测性异异质性的行为者的特定随机效应。然而,这反过来又增加了使模型高度对准化的未知因素的数量。即使对于非常大的网络,我们提出一个基于变异近的可伸缩办法,这不仅导致数字稳定的估计,而且还适用于高维度的网络以及非定向网络。我们进一步表明,包括具体变量在内的特定变量可以减少不易发生的任何异性,即表明,我们通过之前的多用途网络编制,从而便于进行多种形式和多种形式数据。