We implemented and evaluated an automated cyber defense agent. The agent takes security alerts as input and uses reinforcement learning to learn a policy for executing predefined defensive measures. The defender policies were trained in an environment intended to simulate a cyber attack. In the simulation, an attacking agent attempts to capture targets in the environment, while the defender attempts to protect them by enabling defenses. The environment was modeled using attack graphs based on the Meta Attack Language language. We assumed that defensive measures have downtime costs, meaning that the defender agent was penalized for using them. We also assumed that the environment was equipped with an imperfect intrusion detection system that occasionally produces erroneous alerts based on the environment state. To evaluate the setup, we trained the defensive agent with different volumes of intrusion detection system noise. We also trained agents with different attacker strategies and graph sizes. In experiments, the defensive agent using policies trained with reinforcement learning outperformed agents using heuristic policies. Experiments also demonstrated that the policies could generalize across different attacker strategies. However, the performance of the learned policies decreased as the attack graphs increased in size.
翻译:我们实现并评估了一种自动化网络防御智能体。该智能体接收安全警报作为输入,并使用强化学习来学习执行预定义的防御措施的策略。防御策略在旨在模拟网络攻击的环境中进行训练。在模拟中,攻击智能体试图攻击环境中的目标,而防御智能体试图通过启用防御措施来保护它们。该环境是基于元攻击语言的攻击图模拟的。我们假设防御措施具有停机成本,这意味着使用它们的防御智能体将受到惩罚。我们还假设该环境配备了一个不完美的入侵检测系统,基于环境状态偶尔会产生错误的警报。为了评估设置,我们使用不同的入侵检测系统噪声量和攻击者策略训练了不同的防御智能体,并使用了不同的攻击图大小。在实验中,使用强化学习训练而成的防御策略的防御智能体表现优于使用启发式策略的智能体。实验还表明,该策略可以推广到不同的攻击者策略。然而,随着攻击图尺寸的增加,学习策略的性能会降低。