This paper addresses the issue of detecting hierarchical changes in latent variable models (HCDL) from data streams. There are three different levels of changes for latent variable models: 1) the first level is the change in data distribution for fixed latent variables, 2) the second one is that in the distribution over latent variables, and 3) the third one is that in the number of latent variables. It is important to detect these changes because we can analyze the causes of changes by identifying which level a change comes from (change interpretability). This paper proposes an information-theoretic framework for detecting changes of the three levels in a hierarchical way. The key idea to realize it is to employ the MDL (minimum description length) change statistics for measuring the degree of change, in combination with DNML (decomposed normalized maximum likelihood) code-length calculation. We give a theoretical basis for making reliable alarms for changes. Focusing on stochastic block models, we employ synthetic and benchmark datasets to empirically demonstrate the effectiveness of our framework in terms of change interpretability as well as change detection.
翻译:本文论述从数据流中发现潜在变量模型(HCDL)的等级变化问题。潜伏变量模型有三个不同层次的变化:1)第一级是固定潜在变量数据分布的变化;2)第二级是潜在变量分布的变化;3)第三级是潜在变量的分布;3)第三级是潜在变量的数量。检测这些变化十分重要,因为我们可以通过查明变化的起因来分析变化的原因(改变解释性)。本文提议了一个信息理论框架,用以以等级方式检测三个级别的变化。实现这一框架的关键思想是使用MDL(最小描述长度)变化统计数据,以衡量变化程度,同时使用DNML(变异的正常化最大可能性)代码长度计算。我们为进行可靠的变化警报提供了一个理论基础。我们侧重于随机区块模型,我们使用合成和基准数据集,从经验上证明我们框架在变化可解释性和变化探测方面的有效性。