涉及连续和分立装置的伏伏/VAR控制两级非政策强化学习 (Bi-level Off-policy Reinforcement Learning for Volt/VAR Control Involving Continuous and Discrete Devices)

In Volt/Var control (VVC) of active distribution networks(ADNs), both slow timescale discrete devices (STDDs) and fast timescale continuous devices (FTCDs) are involved. The STDDs such as on-load tap changers (OLTC) and FTCDs such as distributed generators should be coordinated in time sequence. Such VCC is formulated as a two-timescale optimization problem to jointly optimize FTCDs and STDDs in ADNs. Traditional optimization methods are heavily based on accurate models of the system, but sometimes impractical because of their unaffordable effort on modelling. In this paper, a novel bi-level off-policy reinforcement learning (RL) algorithm is proposed to solve this problem in a model-free manner. A Bi-level Markov decision process (BMDP) is defined to describe the two-timescale VVC problem and separate agents are set up for the slow and fast timescale sub-problems. For the fast timescale sub-problem, we adopt an off-policy RL method soft actor-critic with high sample efficiency. For the slow one, we develop an off-policy multi-discrete soft actor-critic (MDSAC) algorithm to address the curse of dimensionality with various STDDs. To mitigate the non-stationary issue existing the two agents' learning processes, we propose a multi-timescale off-policy correction (MTOPC) method by adopting importance sampling technique. Comprehensive numerical studies not only demonstrate that the proposed method can achieve stable and satisfactory optimization of both STDDs and FTCDs without any model information, but also support that the proposed method outperforms existing two-timescale VVC methods.

翻译：在活性分销网络的Volt/VAR控制(VVC)中,使用慢时间缩放离散装置(STDDs)和快速时标连续装置(FTDs),使用慢时间缩放离离散装置(STDDs)和快速时标连续装置(FTDs)。STDDs(LTC)和FTCDs(分布式发电机)等STDDs等STDDs,应当按时间顺序加以协调。这种VCC是一个双时间级优化问题,目的是共同优化ADNs中的FTCDs和STDs。传统优化方法在很大程度上基于系统的精确模型模型模型,但有时由于它们无法承担模拟工作,有时不切实际。在本文件中,提出了一个新的双级的双级超级别强化政策学习(RL)算法(RL),以无模式方式解决这个问题。双级的SDSDSDSDSDSDS(SD),用一种缓慢的方法,我们用双级的SDFMD-SDSDSDS(我们用双级的软操作方法,用双级的SDSDSDSDSDSD),用一种学习方法,用一种缓慢的双级平级的方法,用双级的滚动的方法,用双级的滚动的方法,用双级的SDSDSDSDSDSDSD(我们进法,用双级的滚动的方法,用一种学习方法,用两种方法,用一种慢的滚动的方法,用一种慢的方法,用两种方法,用一种慢的滚动的方法,用两种方法,用两种方法,用两种方法,用两种方法用一种我们学习式的SDSDSDSDSDSDSDSDSDSDSDSDSD(我们学习方法,用两种方法,用两种方法,用两种方法,用两种方法用两种方法用一种学习方法用一种学习方法, 。一个慢进方法用一种方法,用两种方法用一种方法用一种方法用一种方法用一种方法,用一种学习方法,用一种方法用一种方法用一种方法用一种方法用两种方法,用一种方法用一种方法,用一种慢方法用一种方法,用一种方法用一种方法用一种

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

专知会员服务

17+阅读 · 2020年7月14日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日