We address the problem of monitoring a set of binary stochastic processes and generating an alert when the number of anomalies among them exceeds a threshold. For this, the decision-maker selects and probes a subset of the processes to obtain noisy estimates of their states (normal or anomalous). Based on the received observations, the decisionmaker first determines whether to declare that the number of anomalies has exceeded the threshold or to continue taking observations. When the decision is to continue, it then decides whether to collect observations at the next time instant or defer it to a later time. If it chooses to collect observations, it further determines the subset of processes to be probed. To devise this three-step sequential decision-making process, we use a Bayesian formulation wherein we learn the posterior probability on the states of the processes. Using the posterior probability, we construct a Markov decision process and solve it using deep actor-critic reinforcement learning. Via numerical experiments, we demonstrate the superior performance of our algorithm compared to the traditional model-based algorithms.
翻译:我们处理的是监测一套二进制随机过程的问题,当其中异常现象的数量超过临界值时,就发出警报。为此,决策者选择和探测其中的一组过程,以获得对其状态(正常或异常)的噪音估计。根据所收到的观察,决策者首先决定是宣布异常现象的数量超过临界值,还是继续观察。当决定继续时,它然后决定是否在下一个时间收集观测结果,还是推迟时间收集观察结果。如果它选择收集观察结果,它进一步决定所要调查的过程的组别。为了设计这个三步顺序的决策过程,我们使用一种巴伊西亚公式,我们从中学习这些过程状态的事后概率。我们利用事后概率,构建一个马尔科夫决定过程,并使用深层的演员-critic 强化学习来解决它。Via 数字实验,我们展示了我们算法与传统的模型算法相比的优异性表现。