We study automated intrusion prevention using reinforcement learning. Following a novel approach, we formulate the problem of intrusion prevention as an (optimal) multiple stopping problem. This formulation gives us insight into the structure of optimal policies, which we show to have threshold properties. For most practical cases, it is not feasible to obtain an optimal defender policy using dynamic programming. We therefore develop a reinforcement learning approach to approximate an optimal policy. Our method for learning and validating policies includes two systems: a simulation system where defender policies are incrementally learned and an emulation system where statistics are produced that drive simulation runs and where learned policies are evaluated. We show that our approach can produce effective defender policies for a practical IT infrastructure of limited size. Inspection of the learned policies confirms that they exhibit threshold properties.
翻译:我们用强化学习方法研究自动入侵预防。我们采用新颖的方法,将入侵预防问题作为(最佳的)多重停止问题。这一方法让我们深入了解了最佳政策的结构,我们证明这些结构具有临界特性。对于大多数实际案例,用动态的编程来获得最佳防御政策是不可行的。因此,我们开发了一种强化学习方法,以近似最佳政策。我们的学习和验证政策方法包括两个系统:一个模拟系统,在其中逐步学习防御政策,一个模拟系统,在其中制作统计数据,推动模拟运行,并在其中评价学习的政策。我们表明,我们的方法可以为规模有限的实用信息技术基础设施制定有效的防御政策。对所学政策的检查证实它们具有临界特性。