Given a sequence of random (directed and weighted) graphs, we address the problem of online monitoring and detection of changes in the underlying data distribution. Our idea is to endow sequential change-point detection (CPD) techniques with a graph representation learning substrate based on the versatile Random Dot Product Graph (RDPG) model. We consider efficient, online updates of a judicious monitoring function, which quantifies the discrepancy between the streaming graph observations and the nominal RDPG. This reference distribution is inferred via spectral embeddings of the first few graphs in the sequence. We characterize the distribution of this running statistic to select thresholds that guarantee error-rate control, and under simplifying approximations we offer insights on the algorithm's detection resolution and delay. The end result is a lightweight online CPD algorithm, that is also explainable by virtue of the well-appreciated interpretability of RDPG embeddings. This is in stark contrast with most existing graph CPD approaches, which either rely on extensive computation, or they store and process the entire observed time series. An apparent limitation of the RDPG model is its suitability for undirected and unweighted graphs only, a gap we aim to close here to broaden the scope of the CPD framework. Unlike previous proposals, our non-parametric RDPG model for weighted graphs does not require a priori specification of the weights' distribution to perform inference and estimation. This network modeling contribution is of independent interest beyond CPD. We offer an open-source implementation of the novel online CPD algorithm for weighted and direct graphs, whose effectiveness and efficiency are demonstrated via (reproducible) synthetic and real network data experiments.
翻译:根据随机(定向和加权)图表序列,我们处理在线监测和检测数据分布基础数变化的在线监测和检测问题。我们的想法是,根据多功能随机点产品图(RDPG)模型,以图形代表学习基数为基础,以图表显示顺序变化点检测(CPD)技术。我们考虑对一个明智的监测功能进行高效在线更新,以量化流图观测与名义RDPG之间的差异。这种引用分布是通过序列中最初几个图表的光谱嵌入来推断的。我们描述这种运行统计的分布情况,以选择保证出错率控制的阈值,在简化近似值下,我们提供关于算法检测分辨率和延迟的洞见。最终结果是一个轻量的在线CPD算法,这也可以通过对 RDPG 嵌入的精确解释来解释。这与大多数现有的图形CPD 方法形成鲜明对比,这些方法要么依靠广泛计算,要么储存和处理整个所观测到的流程时间序列。RDPG模型的明显局限性在于它是否适合未经定向和未经加权的逻辑流数据框架。我们通过浏览和未加权的 RDP 的网络运行的计算,这只显示我们先前的逻辑格式的逻辑格式格式的模型的数值的模型的模型的比值值值值值的比值,只是一个不伸缩数值的比值的比值的比值范围,只是的比值的比值的比值的比值的比值的比值的比值。我们更远。