Detecting changes is of fundamental importance when analyzing data streams and has many applications, e.g., predictive maintenance, fraud detection, or medicine. A principled approach to detect changes is to compare the distributions of observations within the stream to each other via hypothesis testing. Maximum mean discrepancy (MMD; also called energy distance) is a well-known (semi-)metric on the space of probability distributions. MMD gives rise to powerful non-parametric two-sample tests on kernel-enriched domains under mild conditions, which makes its deployment for change detection desirable. However, the classic MMD estimators suffer quadratic complexity, which prohibits their application in the online change detection setting. We propose a general-purpose change detection algorithm, Maximum Mean Discrepancy on Exponential Windows (MMDEW), which leverages the MMD two-sample test, facilitates its efficient online computation on any kernel-enriched domain, and is able to detect any disparity between distributions. Our experiments and analysis show that (1) MMDEW achieves better detection quality than state-of-the-art competitors and that (2) the algorithm has polylogarithmic runtime and logarithmic memory requirements, which allow its deployment to the streaming setting.
翻译:在分析数据流时,检测变化具有根本重要性,而且具有许多应用,例如预测维护、欺诈检测或医学等。检测变化的原则办法是通过假设测试将流内观测分布情况相互比较。最大平均差异(MMD;又称能源距离)是概率分布空间的一个众所周知的(半)量法。MMD导致在轻度条件下对内核富集域进行强大的非参数性双抽样测试,因此有必要部署这种测试以探测变化。然而,典型MMD测算仪具有二次复杂度,禁止将其应用于在线变化检测环境。我们建议采用通用变化检测算法,对暴露视窗的最大平均值差异(MMDEW),利用MMD的双模量测试,方便其在任何内核富度域进行高效的在线计算,并能够发现分布之间的任何差异。我们的实验和分析表明:(1)MMDEW比状态测算质量要好,禁止将其应用于在线变化检测程序。我们建议采用通用的测算算算算法,在测算器上显示其运行和移动的磁流竞争者。(2)该算法允许其飞行流向流压。</s>