We present the PowerGridworld software package to provide users with a lightweight, modular, and customizable framework for creating power-systems-focused, multi-agent Gym environments that readily integrate with existing training frameworks for reinforcement learning (RL). Although many frameworks exist for training multi-agent RL (MARL) policies, none can rapidly prototype and develop the environments themselves, especially in the context of heterogeneous (composite, multi-device) power systems where power flow solutions are required to define grid-level variables and costs. PowerGridworld is an open-source software package that helps to fill this gap. To highlight PowerGridworld's key features, we present two case studies and demonstrate learning MARL policies using both OpenAI's multi-agent deep deterministic policy gradient (MADDPG) and RLLib's proximal policy optimization (PPO) algorithms. In both cases, at least some subset of agents incorporates elements of the power flow solution at each time step as part of their reward (negative cost) structures.
翻译:我们提出PowerGridworld软件包,为用户提供一个轻量、模块化和可定制的框架,以创建以电力系统为重点的多试剂健身环境,这种环境很容易与现有的强化学习培训框架相结合。虽然培训多试剂RL(MARL)政策有许多框架,但没有人能够迅速进行原型并发展环境本身,特别是在需要电流解决方案来界定电网级变量和成本的多种(复合、多设备)动力系统的背景下。PowerGridworld是一个开放源软件包,有助于填补这一空白。为了突出PowerGridworld的关键特征,我们提出两个案例研究,并演示学习MARL政策,同时使用OpenAI的多试剂深度确定性政策梯度(MADDPG)和RLLLib的准氧化政策优化(PO)算法。在这两种算法中,至少有某些子代理在每一时间步骤(负成本)结构中包含电流解决方案的要素。