We study automated intrusion response and formulate the interaction between an attacker and a defender as an optimal stopping game where attack and defense strategies evolve through reinforcement learning and self-play. The game-theoretic modeling enables us to find defender strategies that are effective against a dynamic attacker, i.e. an attacker that adapts its strategy in response to the defender strategy. Further, the optimal stopping formulation allows us to prove that optimal strategies have threshold properties. To obtain near-optimal defender strategies, we develop Threshold Fictitious Self-Play (T-FP), a fictitious self-play algorithm that learns Nash equilibria through stochastic approximation. We show that T-FP outperforms a state-of-the-art algorithm for our use case. The experimental part of this investigation includes two systems: a simulation system where defender strategies are incrementally learned and an emulation system where statistics are collected that drive simulation runs and where learned strategies are evaluated. We argue that this approach can produce effective defender strategies for a practical IT infrastructure.
翻译:我们研究自动入侵反应,并将攻击者与防御者之间的互动发展成最佳停止游戏,通过强化学习和自我游戏,攻击和防御战略演进。游戏理论模型使我们能够找到对动态攻击者(即根据防御战略调整其战略的进攻者)有效的防御战略。此外,最佳停止配方使我们能够证明最佳战略具有临界特性。为了获得近乎最佳的防御战略,我们开发了“极限自斗”(T-FP),这是一种假冒的自我游戏算法,通过随机近似方法学习Nash equilibria。我们显示T-FP超越了我们案件使用的最新算法。这一调查的实验部分包括两个系统:一个模拟系统,在其中逐步学习防御战略,一个模拟系统,收集统计数据,推动模拟运行,评价学习战略。我们说,这一方法可以产生实用信息技术基础设施的有效防御战略。