Defending computer networks from cyber attack requires coordinating actions across multiple nodes based on imperfect indicators of compromise while minimizing disruptions to network operations. Advanced attacks can progress with few observable signals over several months before execution. The resulting sequential decision problem has large observation and action spaces and a long time-horizon, making it difficult to solve with existing methods. In this work, we present techniques to scale deep reinforcement learning to solve the cyber security orchestration problem for large industrial control networks. We propose a novel attention-based neural architecture with size complexity that is invariant to the size of the network under protection. A pre-training curriculum is presented to overcome early exploration difficulty. Experiments show in that the proposed approaches greatly improve both the learning sample complexity and converged policy performance over baseline methods in simulation.
翻译:保护计算机网络免遭网络攻击需要根据不完善的妥协指标协调多个节点的行动,同时尽量减少网络业务的干扰。先进的攻击在实施前几个月内可以以很少可见的信号进行。由此产生的连续决定问题具有很大的观测和行动空间,而且有很长的时间相距,因此难以用现有方法加以解决。在这项工作中,我们提出了各种技术,以扩大深度强化学习,解决大型工业控制网络的网络安全协调问题。我们提出了一个新的关注型神经结构,其规模复杂,与受保护网络的规模不同。提出了培训课程前课程,以克服早期探索的困难。实验表明,拟议方法极大地改进了学习样本的复杂性,并比模拟的基线方法更趋一致了政策性表现。