This article studies the estimation of latent community memberships from pairwise interactions in a network of $N$ nodes, where the observed interactions can be of arbitrary type, including binary, categorical, and vector-valued, and not excluding even more general objects such as time series or spatial point patterns. As a generative model for such data, we introduce a stochastic block model with a general measurable interaction space $\mathcal S$, for which we derive information-theoretic bounds for the minimum achievable error rate. These bounds yield sharp criteria for the existence of consistent and strongly consistent estimators in terms of data sparsity, statistical similarity between intra- and inter-block interaction distributions, and the shape and size of the interaction space. The general framework makes it possible to study temporal and multiplex networks with $\mathcal S = \{0,1\}^T$, in settings where both $N \to \infty$ and $T \to \infty$, and the temporal interaction patterns are correlated over time. For temporal Markov interactions, we derive sharp consistency thresholds. We also present fast online estimation algorithms which fully utilise the non-binary nature of the observed data. Numerical experiments on synthetic and real data show that these algorithms rapidly produce accurate estimates even for very sparse data arrays.
翻译:文章从一个由$N的节点组成的网络的双向互动中估算潜在社区成员,观测到的互动可以是任意类型的,包括二进制、绝对和矢量估值,不排除甚至更一般的物体,例如时间序列或空间点模式。作为这些数据的基因模型,我们引入了一个具有一般可测量互动空间的随机区块模型,用于为最小可实现误差率生成信息-理论界限。这些界限为数据宽度、区内和区际互动分布之间的统计相似性以及互动空间的形状和大小方面的一致性和强烈一致的估测者的存在提供了尖锐标准。这个总框架使得有可能用$\mathcal S = 0. 1 ⁇ T美元来研究时间和多轴网络,在这种环境中,我们为最小的误差率和最小的偏差率提供了信息-理论界限。对于时间互动模式来说,我们得出了精确的一致性临界值,我们甚至得出了精确的线内和区间互动界限。我们还展示了这些快速的模型,用以对数据进行快速的模拟。