This paper presents a secure reinforcement learning (RL) based control method for unknown linear time-invariant cyber-physical systems (CPSs) that are subjected to compositional attacks such as eavesdropping and covert attack. We consider the attack scenario where the attacker learns about the dynamic model during the exploration phase of the learning conducted by the designer to learn a linear quadratic regulator (LQR), and thereafter, use such information to conduct a covert attack on the dynamic system, which we refer to as doubly learning-based control and attack (DLCA) framework. We propose a dynamic camouflaging based attack-resilient reinforcement learning (ARRL) algorithm which can learn the desired optimal controller for the dynamic system, and at the same time, can inject sufficient misinformation in the estimation of system dynamics by the attacker. The algorithm is accompanied by theoretical guarantees and extensive numerical experiments on a consensus multi-agent system and on a benchmark power grid model.
翻译:本文介绍了一种基于安全强化学习(RL)的监控方法,用以控制受到诸如窃听和隐蔽攻击等组成攻击的未知线性时差网络物理系统(CPS),我们考虑了攻击者在设计者学习学习阶段了解动态模型以学习线性二次调节器(LQR),然后利用这种信息对动态系统进行隐蔽攻击,我们称之为双重学习控制和攻击框架。我们提议了一种动态卡穆旗式攻击-静态强化学习算法,可以学习动态系统所需的最佳控制器,同时可以给攻击者估计系统动态时注入足够的错误信息。伴随这种算法的还有关于协商一致的多试剂系统和基准电网模型的理论保障和大量数字实验。