Detecting changes in data streams is a core objective in their analysis and has applications in, say, predictive maintenance, fraud detection, and medicine. A principled approach to detect changes is to compare distributions observed within the stream to each other. However, data streams often are high-dimensional, and changes can be complex, e.g., only manifest themselves in higher moments. The streaming setting also imposes heavy memory and computation restrictions. We propose an algorithm, Maximum Mean Discrepancy Adaptive Windowing (MMDAW), which leverages the well-known Maximum Mean Discrepancy (MMD) two-sample test, and facilitates its efficient online computation on windows whose size it flexibly adapts. As MMD is sensitive to any change in the underlying distribution, our algorithm is a general-purpose non-parametric change detector that fulfills the requirements imposed by the streaming setting. Our experiments show that MMDAW achieves better detection quality than state-of-the-art competitors.
翻译:检测数据流的变化是数据流分析的核心目标,具有在预测性维护、欺诈检测和医学等方面的应用。检测变化的有原则的方法是比较流中观察到的分布物。然而,数据流往往是高维的,变化可能是复杂的,例如,仅表现在较高时刻。流环境还施加了沉重的内存和计算限制。我们提出了算法“最大平均值差异调和窗口(MMDAW) ” ( MMDAW ), 该算法利用了众所周知的最大平均值差异(MMDD) 双模版测试,方便了其在窗口上高效的在线计算,其尺寸可以灵活调整。由于MMDM对基本分布的任何变化都十分敏感,我们的算法是一种通用的非参数变化探测器,可以满足流环境的要求。我们的实验表明MDAWAW的检测质量比最先进的竞争者要好。