We study a Stackelberg game between one attacker and one defender in a configurable environment. The defender picks a specific environment configuration. The attacker observes the configuration and attacks via Reinforcement Learning (RL trained against the observed environment). The defender's goal is to find the environment with minimum achievable reward for the attacker. We apply Evolutionary Diversity Optimization (EDO) to generate diverse population of environments for training. Environments with clearly high rewards are killed off and replaced by new offsprings to avoid wasting training time. Diversity not only improves training quality but also fits well with our RL scenario: RL agents tend to improve gradually, so a slightly worse environment earlier on may become better later. We demonstrate the effectiveness of our approach by focusing on a specific application, Active Directory (AD). AD is the default security management system for Windows domain networks. AD environment describes an attack graph, where nodes represent computers/accounts/etc., and edges represent accesses. The attacker aims to find the best attack path to reach the highest-privilege node. The defender can change the graph by removing a limited number of edges (revoke accesses). Our approach generates better defensive plans than the existing approach and scales better.
翻译:本文研究了一个可配置环境中一个进攻者和一个防御者之间的Stackelberg游戏。防御者选择具体的环境配置。进攻者观察配置并通过强化学习进行攻击。防御者的目标是找到进攻者可达奖励最小的环境。我们应用进化多样性优化(EDO)生成多样化的环境种群进行训练。具有明显高奖励的环境被淘汰并替换为新的后代,以避免浪费训练时间。多样性不仅改善了训练质量,而且与我们的强化学习情境相适应:强化学习代理倾向于逐渐改进,因此较早的稍差环境可能后来会变得更好。我们通过关注特定应用程序Active Directory(AD)来证明我们方法的有效性。 AD是Windows域网络的默认安全管理系统。 AD环境描述了攻击图,其中节点表示计算机/帐户/等,边表示访问。攻击者旨在找到最佳攻击路径以达到最高权限节点。防御者可以通过删除有限数量的边(撤消访问)来更改图。我们的方法比现有方法产生更好的防御计划,并且能够更好地扩展。