Change point detection plays a fundamental role in many real-world applications, where the goal is to analyze and monitor the behaviour of a data stream. In this paper, we study change detection in binary streams. To this end, we use a likelihood ratio between two models as a measure for indicating change. The first model is a single bernoulli variable while the second model divides the stored data in two segments, and models each segment with its own bernoulli variable. Finding the optimal split can be done in $O(n)$ time, where $n$ is the number of entries since the last change point. This is too expensive for large $n$. To combat this we propose an approximation scheme that yields $(1 - \epsilon)$ approximation in $O(\epsilon^{-1} \log^2 n)$ time. The speed-up consists of several steps: First we reduce the number of possible candidates by adopting a known result from segmentation problems. We then show that for fixed bernoulli parameters we can find the optimal change point in logarithmic time. Finally, we show how to construct a candidate list of size $O(\epsilon^{-1} \log n)$ for model parameters. We demonstrate empirically the approximation quality and the running time of our algorithm, showing that we can gain a significant speed-up with a minimal average loss in optimality.
翻译:更改点检测在许多真实世界应用中起着根本作用, 目标在于分析和监测数据流的行为。 在本文中, 我们研究二元流中的改变检测。 为此, 我们使用两个模型之间的概率比来测量变化。 第一个模型是一个单一的伯努利变量, 而第二个模型将存储的数据分成两个部分, 每个部分的模型都有自己的伯诺利变量。 找到最佳的分割可以在 $( n) 时间里完成, 美元是自上次更改点以来的条目数。 这对大美元来说太贵了。 为了打击这个选项, 我们提议了一个近似方案, 以美元( 1 -\ epsilon) 来计算( $ ( 1 -\ epsilon) 的近似值来表示变化。 速度加起来由几个步骤组成 : 首先, 我们通过采用分解问题的已知结果来减少可能的候选人数量 。 我们然后显示, 对于固定的伯努利参数, 我们可以找到在对正对数值时间里找到最佳的更改点 。 最后, 我们展示如何用一个最佳的候选人质量列表来显示我们的平均速度。