Due to the proliferation of renewable energy and its intrinsic intermittency and stochasticity, current power systems face severe operational challenges. Data-driven decision-making algorithms from reinforcement learning (RL) offer a solution towards efficiently operating a clean energy system. Although RL algorithms achieve promising performance compared to model-based control models, there has been limited investigation of RL robustness in safety-critical physical systems. In this work, we first show that several competition-winning, state-of-the-art RL agents proposed for power system control are vulnerable to adversarial attacks. Specifically, we use an adversary Markov Decision Process to learn an attack policy, and demonstrate the potency of our attack by successfully attacking multiple winning agents from the Learning To Run a Power Network (L2RPN) challenge, under both white-box and black-box attack settings. We then propose to use adversarial training to increase the robustness of RL agent against attacks and avoid infeasible operational decisions. To the best of our knowledge, our work is the first to highlight the fragility of grid control RL algorithms, and contribute an effective defense scheme towards improving their robustness and security.
翻译:由于可再生能源的扩散及其内在的间歇性和互不兼容性,目前电力系统面临严重的操作挑战。从强化学习(RL)到数据驱动的决策算法为高效操作清洁能源系统提供了解决办法。虽然RL算法与基于模型的控制模型相比取得了有希望的性能,但对安全临界物理系统中RL稳健性的调查有限。在这项工作中,我们首先表明,为控制电力系统而提议的若干竞争赢家、最先进的RL代理商容易受到对抗性攻击。具体地说,我们利用对手Markov 决策程序学习攻击政策,并通过在白箱和黑箱攻击环境下成功打击来自 " 学会运行一个电力网络(L2RPN)的挑战 " 的多个赢家来展示我们攻击的威力。我们然后提议利用对抗性培训来增强RL代理商对攻击的稳健性,避免不可行的操作决定。据我们所知,我们的工作首先是强调电网控制RL算法的脆弱性,并为改进电网的稳健性和安全提供有效的防御计划。