Responding appropriately to the detections of a sequential change detector requires knowledge of the rate at which false positives occur in the absence of change. When the pre-change and post-change distributions are unknown, setting detection thresholds to achieve a desired false positive rate is challenging, even when there exists a large number of samples from the reference distribution. Existing works resort to setting time-invariant thresholds that focus on the expected runtime of the detector in the absence of change, either bounding it loosely from below or targeting it directly but with asymptotic arguments that we show cause significant miscalibration in practice. We present a simulation-based approach to setting time-varying thresholds that allows a desired expected runtime to be targeted with a 20x reduction in miscalibration whilst additionally keeping the false positive rate constant across time steps. Whilst the approach to threshold setting is metric agnostic, we show that when using the popular and powerful quadratic time MMD estimator, thoughtful structuring of the computation can reduce the cost during configuration from $O(N^2B)$ to $O(N^2+NB)$ and during operation from $O(N^2)$ to $O(N)$, where $N$ is the number of reference samples and $B$ the number of bootstrap samples. Code is made available as part of the open-source Python library \texttt{alibi-detect}.
翻译:要对测得的顺序变化探测器作出适当反应,就需要了解在无变化的情况下出现假正数的速率。当变化前和变化后分布不为人知时,即使参考分布中存在大量样本,为达到理想的假正率设定检测阈值也是困难的。现有工作采用的办法是设定时间变化性阈值,在没有变化的情况下侧重于探测器的预期运行时间,要么将其与下方松绑起来,要么直接针对它,但使用我们显示在实践中造成严重调整的无规律论调。我们采用模拟方法,设定时间变化前和变化后分配阈值,从而能够将预期的运行时间设定为目标,同时减少20x的误差正率,同时进一步保持错误正率在跨时间步骤之间保持不变。尽管阈值设定的方法是测量性,但我们表明,在使用流行和强大的四分度时间 MMMMM 估测算,或者直接针对它,但精确的计算结构可以降低配置期间的费用,从$(N2B)美元到美元(N2)美元(N)美元(NQ+美元)的标值标值标值标值标值部分,而其标值为O+美元(美元)的标值标值为美元(美元)的运行期间成本的标值为美元(美元)标值编号为美元),其标值为美元。