The problem of sequential anomaly detection is considered, where multiple data sources are monitored in real time and the goal is to identify the "anomalous" ones among them, when it is not possible to sample all sources at all times. A detection scheme in this context requires specifying not only when to stop sampling and which sources to identify as anomalous upon stopping, but also which sources to sample at each time instance until stopping. A novel formulation for this problem is proposed, in which the number of anomalous sources is not necessarily known in advance and the number of sampled sources per time instance is not necessarily fixed. Instead, an arbitrary lower bound and an arbitrary upper bound are assumed on the number of anomalous sources, and the fraction of the expected number of samples over the expected time until stopping is required to not exceed an arbitrary, user-specified level. In addition to this sampling constraint, the probabilities of at least one false alarm and at least one missed detection are controlled below user-specified tolerance levels. A general criterion is established for a policy to achieve the minimum expected time until stopping to a first-order asymptotic approximation as both familywise error rates go to zero. This criterion is used to prove the asymptotic optimality of a family of policies that sample each source at each time instance with a probability that depends on the past observations only through the current estimate of the subset of anomalous sources. In particular, the asymptotic optimality is established of a policy that requires minimal computation under any setup of the problem.
翻译:连续异常点检测问题被考虑,在对多个数据源进行实时监测的情况下,目标在于确定其中的“恶性”源,因为不可能在任何时候都对所有源进行取样。这一背景下的检测办法不仅需要具体说明何时停止取样,哪些来源在停止时可确定为反常,而且还需要说明在停止前每个时间取样的源。提出了这一问题的新提法,其中异常源的数量不一定事先知道,抽样源的数量不一定固定。相反,对异常源的数量任意设定了较低的约束和任意的上限,并假定了预期时间样本数量的一小部分,直至需要停止时不得超过任意的用户指定水平。除了这种抽样限制外,至少一次虚假警报和至少一次误测的概率都控制在用户指定的容忍水平以下。为达到最低预期时间,直到停止到第一个序下的最低限,对异常源的任意下限和任意的上限,以及预期的样本数量在预期时间内的一小部分,直至需要停止取样的任意、用户指定水平。除了这种限制之外,至少一次假警报和至少一次误测的概率都控制在用户指定的容忍度水平以下。为了达到最起码的预期时间,为了达到目前最起码的状态,在头端政策下进行初步的精确度的观察,需要每个家庭间测算的概率的概率的概率标准,即取决于每个最佳的概率值的精确度的概率值的标准,要取决于一个最优度的概率标准。