We address the problem of sequentially selecting and observing processes from a given set to find the anomalies among them. The decision-maker observes a subset of the processes at any given time instant and obtains a noisy binary indicator of whether or not the corresponding process is anomalous. In this setting, we develop an anomaly detection algorithm that chooses the processes to be observed at a given time instant, decides when to stop taking observations, and declares the decision on anomalous processes. The objective of the detection algorithm is to identify the anomalies with an accuracy exceeding the desired value while minimizing the delay in decision making. We devise a centralized algorithm where the processes are jointly selected by a common agent as well as a decentralized algorithm where the decision of whether to select a process is made independently for each process. Our algorithms rely on a Markov decision process defined using the marginal probability of each process being normal or anomalous, conditioned on the observations. We implement the detection algorithms using the deep actor-critic reinforcement learning framework. Unlike prior work on this topic that has exponential complexity in the number of processes, our algorithms have computational and memory requirements that are both polynomial in the number of processes. We demonstrate the efficacy of these algorithms using numerical experiments by comparing them with state-of-the-art methods.
翻译:我们从给定的一组中按顺序选择和观察过程以找出其中的异常现象。 决策者在任何特定时刻观察一个过程的子集, 并获得一个噪音的二进制指标, 显示相应的过程是否异常。 在这种环境下, 我们开发一个异常检测算法, 选择在特定时间要观察的过程, 决定何时停止观察, 并宣布异常过程的决定。 检测算法的目标是识别异常现象, 准确性超过预期值, 并尽量减少决策的拖延。 我们设计了一个集中算法, 由共同代理共同选择这些过程, 以及一个分散算法, 其中决定是否选择一个过程是独立的。 我们的算法依赖于一个Markov 决策程序, 以每个过程正常或异常的边际概率为条件, 以观察为条件。 我们使用深层的行为者- 刺激强化学习框架来实施检测算法。 与以前关于这个主题的工作不同, 在程序数量上具有指数复杂性, 我们的算法有计算和记忆的算法要求, 这两种方法都是以数字性方法的比较。