强化学习促进反馈-上网网络复原力 (Reinforcement Learning for Feedback-Enabled Cyber Resilience)

The rapid growth in the number of devices and their connectivity has enlarged the attack surface and weakened cyber systems. As attackers become increasingly sophisticated and resourceful, mere reliance on traditional cyber protection, such as intrusion detection, firewalls, and encryption, is insufficient to secure cyber systems. Cyber resilience provides a new security paradigm that complements inadequate protection with resilience mechanisms. A Cyber-Resilient Mechanism (CRM) adapts to the known or zero-day threats and uncertainties in real-time and strategically responds to them to maintain the critical functions of the cyber systems. Feedback architectures play a pivotal role in enabling the online sensing, reasoning, and actuation of the CRM. Reinforcement Learning (RL) is an important class of algorithms that epitomize the feedback architectures for cyber resiliency, allowing the CRM to provide dynamic and sequential responses to attacks with limited prior knowledge of the attacker. In this work, we review the literature on RL for cyber resiliency and discuss the cyber-resilient defenses against three major types of vulnerabilities, i.e., posture-related, information-related, and human-related vulnerabilities. We introduce moving target defense, defensive cyber deception, and assistive human security technologies as three application domains of CRMs to elaborate on their designs. The RL technique also has vulnerabilities itself. We explain the major vulnerabilities of RL and present several attack models in which the attacks target the rewards, the measurements, and the actuators. We show that the attacker can trick the RL agent into learning a nefarious policy with minimum attacking effort, which shows serious security concerns for RL-enabled systems. Finally, we discuss the future challenges of RL for cyber security and resiliency and emerging applications of RL-based CRMs.

翻译：由于攻击者越来越精密和机智,仅仅依靠入侵探测、防火墙和加密等传统网络保护,不足以保障网络系统的安全。网络复原力提供了一种新的安全范式,补充了抗御机制的保护不足。一个网络抗御机制(CRM)适应了已知或零天的威胁以及实时和战略上的不确定性,以保持网络系统的关键功能。反馈结构在促成CRM的在线感测、推理和动作方面发挥着关键作用。强化学习(RL)是一系列重要的算法,它集中体现了网络复原力的反馈结构,使CRM能够以对攻击者所知有限的方式对攻击作出动态和顺序的反应。在这项工作中,我们审查了RL网络的文献,并讨论了基于网络的防御能力,以保持网络系统的关键功能。与CRMM相关的定位、信息相关和与人类相关的网络防御能力,我们引入了C-L目标的防御、防御性网络脆弱性,并解释了RRR的模型本身。