This paper proposes a safe reinforcement learning algorithm for generation bidding decisions and unit maintenance scheduling in a competitive electricity market environment. In this problem, each unit aims to find a bidding strategy that maximizes its revenue while concurrently retaining its reliability by scheduling preventive maintenance. The maintenance scheduling provides some safety constraints which should be satisfied at all times. Satisfying the critical safety and reliability constraints while the generation units have an incomplete information of each others' bidding strategy is a challenging problem. Bi-level optimization and reinforcement learning are state of the art approaches for solving this type of problems. However, neither bi-level optimization nor reinforcement learning can handle the challenges of incomplete information and critical safety constraints. To tackle these challenges, we propose the safe deep deterministic policy gradient reinforcement learning algorithm which is based on a combination of reinforcement learning and a predicted safety filter. The case study demonstrates that the proposed approach can achieve a higher profit compared to other state of the art methods while concurrently satisfying the system safety constraints.
翻译:本文提出了在竞争性电力市场环境下为产生投标决定和单位维护时间安排制定安全强化学习算法,在这一问题下,每个单位都旨在寻找一个通过安排预防性维护既能最大限度地增加收入又同时保持可靠性的招标战略; 维护时间安排提供了一定的安全限制,这些限制在任何时候都应得到满足; 满足关键的安全和可靠性限制,而各生成单位对彼此的招标战略的信息不完全,这是一个具有挑战性的问题; 双级优化和加强学习是解决这类问题的最先进方法; 然而,无论是双级优化还是强化学习,都无法应对不完整信息和关键安全限制的挑战; 为了应对这些挑战,我们提议采用安全、深层的确定性政策梯度强化学习算法,该算法以强化学习与预测的安全过滤器相结合为基础; 案例研究表明,拟议的方法能够比其他工艺获得更高的利润,同时满足系统安全限制。