强化学习促进反馈-上网网络复原力 (Reinforcement Learning for Feedback-Enabled Cyber Resilience)

Digitization and remote connectivity have enlarged the attack surface and made cyber systems more vulnerable. As attackers become increasingly sophisticated and resourceful, mere reliance on traditional cyber protection, such as intrusion detection, firewalls, and encryption, is insufficient to secure the cyber systems. Cyber resilience provides a new security paradigm that complements inadequate protection with resilience mechanisms. A Cyber-Resilient Mechanism (CRM) adapts to the known or zero-day threats and uncertainties in real-time and strategically responds to them to maintain critical functions of the cyber systems in the event of successful attacks. Feedback architectures play a pivotal role in enabling the online sensing, reasoning, and actuation process of the CRM. Reinforcement Learning (RL) is an essential tool that epitomizes the feedback architectures for cyber resilience. It allows the CRM to provide sequential responses to attacks with limited or without prior knowledge of the environment and the attacker. In this work, we review the literature on RL for cyber resilience and discuss cyber resilience against three major types of vulnerabilities, i.e., posture-related, information-related, and human-related vulnerabilities. We introduce three application domains of CRMs: moving target defense, defensive cyber deception, and assistive human security technologies. The RL algorithms also have vulnerabilities themselves. We explain the three vulnerabilities of RL and present attack models where the attacker targets the information exchanged between the environment and the agent: the rewards, the state observations, and the action commands. We show that the attacker can trick the RL agent into learning a nefarious policy with minimum attacking effort. Lastly, we discuss the future challenges of RL for cyber security and resilience and emerging applications of RL-based CRMs.

翻译：由于攻击者越来越精密和机智,仅仅依靠入侵探测、防火墙和加密等传统网络保护,不足以保障网络系统的安全。网络复原力提供了一种新的安全范式,补充了抗御机制的保护不足。一个网络抗御机制(CRM)适应了已知或零天的威胁以及实时和战略性的不确定性,以便在攻击成功时保持网络系统的关键功能。反馈架构在促成CRM的在线感测、推理和动作进程方面发挥着关键作用。加强学习(RL)是体现网络复原力反馈架构的基本工具。它使CRM能够以有限的或没有事先对环境和攻击者的了解,对攻击事件作出顺序反应。在这项工作中,我们审查了网络复原力方面的文献,并讨论了网络系统系统在三种主要脆弱性(即:态势相关挑战、信息相关和人类相关的脆弱性)方面的网络复原力。我们引入了三个应用领域,即CRRMR的内变能力, 并解释了CRMM的防御性风险和变现策略。