A one-step two-critic deep reinforcement learning (OSTC-DRL) approach for inverter-based volt-var control (IB-VVC) in active distribution networks is proposed in this paper. Firstly, considering IB-VVC can be formulated as a single-period optimization problem, we formulate the IB-VVC as a one-step Markov decision process rather than the standard Markov decision process, which simplifies the DRL learning task. Then we design the one-step actor-critic DRL scheme which is a simplified version of recent DRL algorithms, and it avoids the issue of Q value overestimation successfully. Furthermore, considering two objectives of VVC: minimizing power loss and eliminating voltage violation, we utilize two critics to approximate the rewards of two objectives separately. It simplifies the approximation tasks of each critic, and avoids the interaction effect between two objectives in the learning process of critic. The OSTC-DRL approach integrates the one-step actor-critic DRL scheme and the two-critic technology. Based on the OSTC-DRL, we design two centralized DRL algorithms. Further, we extend the OSTC-DRL to multi-agent OSTC-DRL for decentralized IB-VVC and design two multi-agent DRL algorithms. Simulations demonstrate that the proposed OSTC-DRL has a faster convergence rate and a better control performance, and the multi-agent OSTC-DRL works well for decentralized IB-VVC problems.
翻译:首先,考虑到IB-VVVC可以作为一个单一周期优化问题来制定,我们将IB-VVVC作为一个单步马可夫决策程序,而不是标准马可夫决策程序,它简化了DRL学习任务。然后,我们设计了一个单步的行为者-critic DRL计划,这是最近DRL算法的简化版本,它避免了Q比率过高的问题。此外,考虑到VB-VVVC的两个目标:尽量减少电力损失,消除违反电流的问题,我们利用两位批评家来分别估计两个目标的收益。它简化了每个批评家的近距离任务,并避免了SIML学习进程中两个目标之间的互动效应。OSTC-DRL方法将DRL系统单步的DRL运算法简化成DRL,它成功地避免了QRVC的超步递率问题。