We study automated intrusion response and formulate the interaction between an attacker and a defender as an optimal stopping game where attack and defense strategies evolve through reinforcement learning and self-play. The game-theoretic modeling enables us to find defender strategies that are effective against a dynamic attacker, i.e. an attacker that adapts its strategy in response to the defender strategy. Further, the optimal stopping formulation allows us to prove that optimal strategies have threshold properties. To obtain near-optimal defender strategies, we develop Threshold Fictitious Self-Play (T-FP), a fictitious self-play algorithm that learns Nash equilibria through stochastic approximation. We show that T-FP outperforms a state-of-the-art algorithm for our use case. The experimental part of this investigation includes two systems: a simulation system where defender strategies are incrementally learned and an emulation system where statistics are collected that drive simulation runs and where learned strategies are evaluated. We argue that this approach can produce effective defender strategies for a practical IT infrastructure.
翻译:我们研究了自动化入侵响应,并将攻击者和防御者之间的交互形式化为最优停止博弈,通过强化学习和自我对弈来发展攻击和防御策略。博弈理论建模使我们能够找到对动态攻击者有效的防御策略,即攻击者会根据防御者策略而适应其策略。此外,最优停止的形式化允许我们证明最优策略具有阈值特性。为了获得近似最优的防御策略,我们开发了阈值虚构自我对弈(T-FP)算法,该算法通过随机逼近来学习纳什均衡。我们展示了T-FP在我们的用例中胜过了最先进的算法。这项研究的实验包括两个系统:一个模拟系统,其中增量地学习防御策略;以及一个仿真系统,其中收集统计数据以驱动模拟运行,并评估学习到的策略。我们认为这种方法可以为实际的IT基础设施产生有效的防御策略。