Most current anomaly detection methods suffer from the curse of dimensionality when dealing with high-dimensional data. We propose an anomaly detection algorithm that can scale to high-dimensional data using concepts from the theory of large deviations. The proposed Large Deviations Anomaly Detection (LAD) algorithm is shown to outperform state of art anomaly detection methods on a variety of large and high-dimensional benchmark data sets. Exploiting the ability of the algorithm to scale to high-dimensional data, we propose an online anomaly detection method to identify anomalies in a collection of multivariate time series. We demonstrate the applicability of the online algorithm in identifying counties in the United States with anomalous trends in terms of COVID-19 related cases and deaths. Several of the identified anomalous counties correlate with counties with documented poor response to the COVID pandemic.
翻译:目前大多数异常点探测方法在处理高维数据时都受到维度的诅咒。我们建议采用一个异常点检测算法,利用大偏差理论的概念,将异常点检测算法推广到高维数据中。拟议的大偏差异常检测算法在各种大型和高维基准数据集中表现优于最新异常点检测方法。我们利用算法将算法推广到高维数据中的能力,提出一种在线异常点检测方法,以识别多变量时间序列集中的异常点。我们展示了在线算法在确定具有COVID-19相关案例和死亡异常趋势的美国各州方面的适用性。一些已查明的异常点与记录对COVID大流行病反应不佳的州相关联。