Defending computer networks from cyber attack requires timely responses to alerts and threat intelligence. Decisions about how to respond involve coordinating actions across multiple nodes based on imperfect indicators of compromise while minimizing disruptions to network operations. Currently, playbooks are used to automate portions of a response process, but often leave complex decision-making to a human analyst. In this work, we present a deep reinforcement learning approach to autonomous response and recovery in large industrial control networks. We propose an attention-based neural architecture that is flexible to the size of the network under protection. To train and evaluate the autonomous defender agent, we present an industrial control network simulation environment suitable for reinforcement learning. Experiments show that the learned agent can effectively mitigate advanced attacks that progress with few observable signals over several months before execution. The proposed deep reinforcement learning approach outperforms a fully automated playbook method in simulation, taking less disruptive actions while also defending more nodes on the network. The learned policy is also more robust to changes in attacker behavior than playbook approaches.
翻译:保护计算机网络免遭网络攻击需要及时应对警报和威胁情报。 关于如何应对的决定需要基于不完善的妥协指标,在多个节点上协调行动,同时尽量减少网络运行的干扰。 目前,游戏本被用于使响应过程的部分内容自动化,但往往将复杂的决策权留给人类分析师。 在这项工作中,我们对大型工业控制网络的自主反应和复苏提出了一种深度强化学习方法。我们提出了一种与受保护网络的规模相适应的基于关注的神经结构。为了培训和评估自主捍卫者代理,我们提出了一个适合强化学习的工业控制网络模拟环境。实验表明,学习的代理可以有效地减轻先进攻击,而在执行前几个月内,以很少可见的信号取得进展。拟议的深度强化学习方法在模拟中超越了完全自动化的游戏手册方法,在维护网络上更多的节点的同时采取破坏性行动。所学的政策对于攻击者行为的变化也比游戏方法更为有力。