We study outlier (a.k.a., anomaly) detection for single-pass non-stationary streaming data. In the well-studied offline or batch outlier detection problem, traditional methods such as kernel One-Class SVM (OCSVM) are both computationally heavy and prone to large false-negative (Type II) errors under non-stationarity. To remedy this, we introduce SONAR, an efficient SGD-based OCSVM solver with strongly convex regularization. We show novel theoretical guarantees on the Type I/II errors of SONAR, superior to those known for OCSVM, and further prove that SONAR ensures favorable lifelong learning guarantees under benign distribution shifts. In the more challenging problem of adversarial non-stationary data, we show that SONAR can be used within an ensemble method and equipped with changepoint detection to achieve adaptive guarantees, ensuring small Type I/II errors on each phase of data. We validate our theoretical findings on synthetic and real-world datasets.
翻译:我们研究单次通过非平稳流数据的离群点(亦称异常)检测问题。在已得到充分研究的离线或批量离群点检测问题中,传统方法如核单类支持向量机(OCSVM)不仅计算负担重,而且在非平稳性下容易产生较大的假阴性(II型)错误。为此,我们提出了SONAR——一种基于随机梯度下降的高效OCSVM求解器,采用强凸正则化。我们证明了SONAR在I/II型错误方面具有优于已知OCSVM的新型理论保证,并进一步证明SONAR在良性分布漂移下能确保有利的终身学习保证。在更具挑战性的对抗性非平稳数据问题中,我们表明SONAR可通过集成方法结合变点检测实现自适应保证,确保在数据的每个阶段都具有较小的I/II型错误。我们在合成数据集和真实世界数据集上验证了理论发现。