With the increasing system complexity and attack sophistication, the necessity of autonomous cyber defense becomes vivid for cyber and cyber-physical systems (CPSs). Many existing frameworks in the current state-of-the-art either rely on static models with unrealistic assumptions, or fail to satisfy the system safety and security requirements. In this paper, we present a new hybrid autonomous agent architecture that aims to optimize and verify defense policies of reinforcement learning (RL) by incorporating constraints verification (using satisfiability modulo theory (SMT)) into the agent's decision loop. The incorporation of SMT does not only ensure the satisfiability of safety and security requirements, but also provides constant feedback to steer the RL decision-making toward safe and effective actions. This approach is critically needed for CPSs that exhibit high risk due to safety or security violations. Our evaluation of the presented approach in a simulated CPS environment shows that the agent learns the optimal policy fast and defeats diversified attack strategies in 99\% cases.
翻译:随着系统的复杂性和攻击性日益复杂,自主网络防御的必要性对于网络和网络物理系统变得十分生动。目前最先进的许多现有框架要么依靠不现实假设的静态模型,要么无法满足系统的安全和安保要求。在本文件中,我们提出了一个新的混合自主代理结构,目的是通过将限制核查(使用可卫星化模调理论)纳入代理人的决策循环,优化和核查强化学习的防御政策(RL)。SMT的纳入不仅能确保安全和安保要求的可视性,而且还能不断提供反馈,引导遥控实验室的决策转向安全有效的行动。对于因安全和安保违规而面临高风险的CPS来说,这一方法是极为必要的。我们对模拟CPS环境中的介绍方法的评估表明,该代理公司学习了最佳政策,在99个案例中击败了多样化的攻击战略。