With the increasing penetration of distributed energy resources, distributed optimization algorithms have attracted significant attention for power systems applications due to their potential for superior scalability, privacy, and robustness to a single point-of-failure. The Alternating Direction Method of Multipliers (ADMM) is a popular distributed optimization algorithm; however, its convergence performance is highly dependent on the selection of penalty parameters, which are usually chosen heuristically. In this work, we use reinforcement learning (RL) to develop an adaptive penalty parameter selection policy for the AC optimal power flow (ACOPF) problem solved via ADMM with the goal of minimizing the number of iterations until convergence. We train our RL policy using deep Q-learning, and show that this policy can result in significantly accelerated convergence (up to a 59% reduction in the number of iterations compared to existing, curvature-informed penalty parameter selection methods). Furthermore, we show that our RL policy demonstrates promise for generalizability, performing well under unseen loading schemes as well as under unseen losses of lines and generators (up to a 50% reduction in iterations). This work thus provides a proof-of-concept for using RL for parameter selection in ADMM for power systems applications.
翻译:随着分布式能源资源日益普及,分布式优化算法吸引了对电力系统应用的极大关注,因为这些算法有可能对单一故障点具有更高的可缩放性、隐私和稳健性。倍增效应的替代方向法是一种流行分布式优化算法;然而,其趋同性表现高度取决于刑罚参数的选择,这些参数通常都是以粗略方式选择的。在这项工作中,我们利用强化学习(RL)为通过ADMMM解决的AC最佳电流(ACOPF)问题制定适应性刑罚参数选择政策,目的是尽量减少迭代数,直至趋同。我们利用深量学习来培训我们的RL政策,并表明这一政策可以大大加速趋同(与现有的、具有曲线意识的罚款参数选择方法相比,迭代数减少59% )。此外,我们展示了我们的RL政策对可概括性的承诺,在看不见的装货计划下,以及在看不见的线路和发电机损失(在迭代法中减少50% ) 。这项工作为应用ADMMM的参数选择提供了一个证据。