Penetration testing the organised attack of a computer system in order to test existing defences has been used extensively to evaluate network security. This is a time consuming process and requires in-depth knowledge for the establishment of a strategy that resembles a real cyber-attack. This paper presents a novel deep reinforcement learning architecture with hierarchically structured agents called HA-DRL, which employs an algebraic action decomposition strategy to address the large discrete action space of an autonomous penetration testing simulator where the number of actions is exponentially increased with the complexity of the designed cybersecurity network. The proposed architecture is shown to find the optimal attacking policy faster and more stably than a conventional deep Q-learning agent which is commonly used as a method to apply artificial intelligence in automatic penetration testing.
翻译:为测试现有防御系统而进行的计算机系统有组织攻击的穿透测试被广泛用于评估网络安全,这是一个耗时的过程,需要深入的知识来制定类似于真正的网络攻击的战略。本文件展示了一个新的深层强化学习结构,由等级结构化的代理机构HA-DRL组成,该代理机构使用代数行动分解战略来解决自动穿透测试模拟器的大型离散行动空间问题,该模拟器的行动数量随着设计的网络安全网络的复杂性而成倍增加。拟议的结构显示,它比通常用作在自动穿透测试中应用人工智能的一种方法的传统的深层Q学习代理机构,更快和更有刺性地找到了最佳的打击政策。