Restoring power distribution systems (PDS) after large-scale outages requires sequential switching operations that reconfigure feeder topology and coordinate distributed energy resources (DERs) under nonlinear constraints such as power balance, voltage limits, and thermal ratings. These challenges make conventional optimization and value-based RL approaches computationally inefficient and difficult to scale. This paper applies a Heterogeneous-Agent Reinforcement Learning (HARL) framework, instantiated through Heterogeneous-Agent Proximal Policy Optimization (HAPPO), to enable coordinated restoration across interconnected microgrids. Each agent controls a distinct microgrid with different loads, DER capacities, and switch counts, introducing practical structural heterogeneity. Decentralized actor policies are trained with a centralized critic to compute advantage values for stable on-policy updates. A physics-informed OpenDSS environment provides full power flow feedback and enforces operational limits via differentiable penalty signals rather than invalid action masking. The total DER generation is capped at 2400 kW, and each microgrid must satisfy local supply-demand feasibility. Experiments on the IEEE 123-bus and IEEE 8500-node systems show that HAPPO achieves faster convergence, higher restored power, and smoother multi-seed training than DQN, PPO, MAES, MAGDPG, MADQN, Mean-Field RL, and QMIX. Results demonstrate that incorporating microgrid-level heterogeneity within the HARL framework yields a scalable, stable, and constraint-aware solution for complex PDS restoration.
翻译:大规模停电后恢复配电系统需要执行顺序切换操作,以在功率平衡、电压限值和热额定值等非线性约束下重构馈线拓扑并协调分布式能源资源。这些挑战使得传统优化方法和基于价值的强化学习方法计算效率低下且难以扩展。本文应用异构智能体强化学习框架,通过异构智能体近端策略优化实现互联微电网间的协调恢复。每个智能体控制一个具有不同负荷、DER容量和开关数量的独立微电网,引入了实际的结构异质性。分散的执行器策略通过集中式评价器进行训练,以计算优势值实现稳定的同策略更新。基于物理信息的OpenDSS环境提供完整的潮流反馈,并通过可微分惩罚信号而非无效动作掩码来强制执行操作限制。总DER发电量上限为2400 kW,每个微电网必须满足本地供需可行性。在IEEE 123节点和IEEE 8500节点系统上的实验表明,与DQN、PPO、MAES、MAGDPG、MADQN、平均场强化学习和QMIX相比,HAPPO实现了更快的收敛速度、更高的恢复功率和更平滑的多种子训练。结果表明,在HARL框架中纳入微电网层级的异质性,为复杂配电系统恢复提供了可扩展、稳定且具备约束感知能力的解决方案。