Reinforcement learning (RL) operating on attack graphs leveraging cyber terrain principles are used to develop reward and state associated with determination of surveillance detection routes (SDR). This work extends previous efforts on developing RL methods for path analysis within enterprise networks. This work focuses on building SDR where the routes focus on exploring the network services while trying to evade risk. RL is utilized to support the development of these routes by building a reward mechanism that would help in realization of these paths. The RL algorithm is modified to have a novel warm-up phase which decides in the initial exploration which areas of the network are safe to explore based on the rewards and penalty scale factor.
翻译:利用利用网络地形原则进行攻击图操作的强化学习(RL)用于制定奖励和与确定监视探测路线有关的状态。这项工作扩展了以前在开发企业网络内路径分析RL方法方面的努力。这项工作侧重于建立特别提款权,其路线侧重于探索网络服务,同时试图规避风险。利用RL建立一个奖励机制,帮助实现这些路径,以支持这些路径的开发。RL算法经过修改,以有一个新的暖化阶段,在初步勘探中决定网络的哪些区域可以安全地根据奖惩等级系数进行勘探。