The problem of sequential anomaly detection and identification is considered in the presence of a sampling constraint. Specifically, multiple data streams are generated by distinct sources and the goal is to quickly identify those that exhibit ``anomalous'' behavior, when it is not possible to sample every source at each time instant. Thus, in addition to a stopping rule, which determines when to stop sampling, and a decision rule, which indicates which sources to identify as anomalous upon stopping, one needs to specify a sampling rule that determines which sources to sample at each time instant. The focus of this work is on ordering sampling rules, which sample the data sources, among those currently estimated as anomalous (resp. non-anomalous), for which the corresponding local test statistics have the smallest (resp. largest) values. It is shown that with an appropriate design, which is specified explicitly, an ordering sampling rule leads to the optimal expected time for stopping, among all policies that satisfy the same sampling and error constraints, to a first-order asymptotic approximation as the false positive and false negative error rates under control both go to zero. This is the first asymptotic optimality result for ordering sampling rules when multiple sources can be sampled per time instant. Moreover, this is established under a general setup where the number of anomalies is not required to be a priori known. A novel proof technique is introduced, which unifies different versions of the problem regarding the homogeneity of the sources and prior information on the number of anomalies.
翻译:暂无翻译