Reinforcement learning (RL) is a promising optimal control technique for multi-energy management systems. It does not require a model a priori - reducing the upfront and ongoing project-specific engineering effort and is capable of learning better representations of the underlying system dynamics. However, vanilla RL does not provide constraint satisfaction guarantees - resulting in various unsafe interactions within its safety-critical environment. In this paper, we present two novel safe RL methods, namely SafeFallback and GiveSafe, where the safety constraint formulation is decoupled from the RL formulation and which provides hard-constraint satisfaction guarantees both during training (exploration) and exploitation of the (close-to) optimal policy. In a simulated multi-energy systems case study we have shown that both methods start with a significantly higher utility (i.e. useful policy) compared to a vanilla RL benchmark (94,6% and 82,8% compared to 35,5%) and that the proposed SafeFallback method even can outperform the vanilla RL benchmark (102,9% to 100%). We conclude that both methods are viably safety constraint handling techniques capable beyond RL, as demonstrated with random agents while still providing hard-constraint guarantees. Finally, we propose fundamental future work to i.a. improve the constraint functions itself as more data becomes available.
翻译:强化学习(RL)是多能源管理系统中最有希望的最佳控制技术。 它不需要先验性的模式 — — 减少前期和正在进行的具体项目工程工作,能够更好地了解基本系统动态的反映。 但是,香草RL没有提供约束性满意度保障,导致安全临界环境中的各种不安全互动。 在本文中,我们介绍了两种新型的安全安全RL方法,即“安全Fallback”和“GeletSafe”,其中安全限制配方与RL配方脱钩,在培训(勘探)和开发(接近)最佳政策期间提供硬约束性满意度保障。在模拟多能源系统案例研究中,我们发现这两种方法的起点都比香草RL基准(94.6%和82.8%,比35.5%)高得多。 拟议的“安全限制配方”方法甚至可以比Vanilla RL基准(102.9 %至100 % ) 。 我们的结论是,这两种方法都是在提供超越现有基本安全约束性技术的硬性限制的同时,我们最后提议将安全约束作为基本保证。