对等级统计模型中综合假设的异常搜索 (Anomaly Search over Composite Hypotheses in Hierarchical Statistical Models)

Detection of anomalies among a large number of processes is a fundamental task that has been studied in multiple research areas, with diverse applications spanning from spectrum access to cyber-security. Anomalous events are characterized by deviations in data distributions, and thus can be inferred from noisy observations based on statistical methods. In some scenarios, one can often obtain noisy observations aggregated from a chosen subset of processes. Such hierarchical search can further minimize the sample complexity while retaining accuracy. An anomaly search strategy should thus be designed based on multiple requirements, such as maximizing the detection accuracy; efficiency, be efficient in terms of sample complexity; and be able to cope with statistical models that are known only up to some missing parameters (i.e., composite hypotheses). In this paper, we consider anomaly detection with observations taken from a chosen subset of processes that conforms to a predetermined tree structure with partially known statistical model. We propose Hierarchical Dynamic Search (HDS), a sequential search strategy that uses two variations of the Generalized Log Likelihood Ratio (GLLR) statistic, and can be used for detection of multiple anomalies. HDS is shown to be order-optimal in terms of the size of the search space, and asymptotically optimal in terms of detection accuracy. An explicit upper bound on the error probability is established for the finite sample regime. In addition to extensive experiments on synthetic datasets, experiments have been conducted on the DARPA intrusion detection dataset, showing that HDS is superior to existing methods.

翻译：在许多研究领域,从频谱存取到网络安全等各种应用范围各不相同,对大量流程中异常现象的检测是一项基本任务,已经在许多研究领域进行了研究,从频谱存取到网络安全等不同应用领域,发现异常现象是一项基本任务;异常事件的特点是数据分布的偏差,因此可以根据根据统计方法进行的杂乱观察推断出。在有些情况下,人们往往可以从所选择的一组流程中获得杂乱的观测。这种分级搜索可以进一步减少抽样复杂性,同时保留准确性。异常搜索战略应当基于多种要求来设计,例如最大限度地提高探测准确性;效率,抽样复杂性方面的效率;以及能够应对只有某些缺失的参数(即综合假假设)才知道的统计模型。在本文件中,我们考虑通过从符合部分已知统计模式的预定树结构的选定流程中进行观测,来检测异常现象。我们提议采用分级动态搜索(HDS)战略,使用通用日志相似比比(GLLR)的两种变化,并可用于检测多种异常现象。HDDDS显示在最精确性测算方法上的精确性。