Advancements in reinforcement learning (RL) have inspired new directions in intelligent automation of network defense. However, many of these advancements have either outpaced their application to network security or have not considered the challenges associated with implementing them in the real-world. To understand these problems, this work evaluates several RL approaches implemented in the second edition of the CAGE Challenge, a public competition to build an autonomous network defender agent in a high-fidelity network simulator. Our approaches all build on the Proximal Policy Optimization (PPO) family of algorithms, and include hierarchical RL, action masking, custom training, and ensemble RL. We find that the ensemble RL technique performs strongest, outperforming our other models and taking second place in the competition. To understand applicability to real environments we evaluate each method's ability to generalize to unseen networks and against an unknown attack strategy. In unseen environments, all of our approaches perform worse, with degradation varied based on the type of environmental change. Against an unknown attacker strategy, we found that our models had reduced overall performance even though the new strategy was less efficient than the ones our models trained on. Together, these results highlight promising research directions for autonomous network defense in the real world.
翻译:强化学习进步(RL)激励了网络防御智能自动化的新方向。然而,许多这些进步要么超越了网络安全的应用速度,要么没有考虑在现实世界中实施这些进步的挑战。为了理解这些问题,这项工作评估了在第二版CAG挑战第二版中实施的若干RL方法,这是在高不洁网络模拟器中建立一个自主网络捍卫代理器的公共竞争,目的是在高不洁网络模拟器中建立一个自主网络捍卫代理器。我们的方法都建立在完美政策优化的算法大家庭之上,包括等级性RL、行动掩码、定制培训和共通RL。我们发现共通RL技术表现得最强,优于我们的其他模式,在竞争中居于第二位。为了了解对真实环境的适用性,我们评估了每一种方法在普通化网络和未知的攻击战略中的能力。在隐秘环境中,我们所有的方法都表现得更差,根据环境变化的种类而退化程度各异。在未知的攻击者战略下,我们发现我们的模型降低了总体绩效,尽管在新的防御战略中,我们所训练的模型比我们所训练的自主性强。