We study automated intrusion prevention using reinforcement learning. Following a novel approach, we formulate the interaction between an attacker and a defender as an optimal stopping game and let attack and defense strategies evolve through reinforcement learning and self-play. The game-theoretic perspective allows us to find defender strategies that are effective against dynamic attackers. The optimal stopping formulation gives us insight into the structure of optimal strategies, which we show to have threshold properties. To obtain the optimal defender strategies, we introduce T-FP, a fictitious self-play algorithm that learns Nash equilibria through stochastic approximation. We show that T-FP outperforms a state-of-the-art algorithm for our use case. Our overall method for learning and evaluating strategies includes two systems: a simulation system where defender strategies are incrementally learned and an emulation system where statistics are produced that drive simulation runs and where learned strategies are evaluated. We conclude that this approach can produce effective defender strategies for a practical IT infrastructure.
翻译:我们用强化学习来研究自动入侵预防。 采用新颖的方法,我们将攻击者与捍卫者之间的互动发展为最佳的停止游戏,让攻击和防御战略通过强化学习和自玩来演化。 游戏理论视角使我们能够找到对动态攻击者有效的防御战略。 最佳停止配方让我们深入了解最佳战略的结构,我们显示这些战略具有临界特性。 为了获得最佳防御战略,我们引入了T-FP, 这是一种假冒的自我游戏算法,通过随机近似来学习Nash equiquilibliaria。 我们显示T-FP优于我们使用案例的最先进的算法。 我们的总体学习和评价战略方法包括两个系统:一个模拟系统,在那里逐步学习防御战略,一个模拟系统,在其中生成统计数据,推动模拟运行,并对学习战略进行评估。 我们的结论是,这一方法可以产生实用的信息技术基础设施的有效防御战略。