The analysis of network data has gained considerable interest in recent years. This also includes the analysis of large, high-dimensional networks with hundreds and thousands of nodes. While exponential random graph models serve as workhorse for network data analyses, their applicability to very large networks is problematic via classical inference such as maximum likelihood or exact Bayesian estimation owing to scaling and instability issues. The latter trace from the fact that classical network statistics consider nodes as exchangeable, i.e., actors in the network are assumed to be homogeneous. This is often questionable. One way to circumvent the restrictive assumption is to include actor-specific random effects, which account for unobservable heterogeneity. However, this increases the number of unknowns considerably, thus making the model highly-parameterized. As a solution even for very large networks, we propose a scalable approach based on variational approximations, which not only leads to numerically stable estimation but is also applicable to high-dimensional directed as well as undirected networks. We furthermore demonstrate that including node-specific covariates can reduce node heterogeneity, which we facilitate through versatile prior formulations and a new measure that we call posterior explained variance. We illustrate our approach in three diverse examples, covering network data from the Italian Parliament, international arms trading, and Facebook; and conduct detailed simulation studies.
翻译:暂无翻译