Single-Agent (SA) Reinforcement Learning systems have shown outstanding re-sults on non-stationary problems. However, Multi-Agent Reinforcement Learning(MARL) can surpass SA systems generally and when scaling. Furthermore, MAsystems can be super-powered by collaboration, which can happen through ob-serving others, or a communication system used to share information betweencollaborators. Here, we developed a distributed MA learning mechanism withthe ability to communicate based on decentralised partially observable Markovdecision processes (Dec-POMDPs) and Graph Neural Networks (GNNs). Minimis-ing the time and energy consumed by training Machine Learning models whileimproving performance can be achieved by collaborative MA mechanisms. Wedemonstrate this in a real-world scenario, an offshore wind farm, including a set ofdistributed wind turbines, where the objective is to maximise collective efficiency.Compared to a SA system, MA collaboration has shown significantly reducedtraining time and higher cumulative rewards in unseen and scaled scenarios.
翻译:单一代理(SA)强化学习系统显示,在非静止问题方面有出色的再演,然而,多代理强化学习系统(MARL)一般和在规模扩大时可以超越SA系统。此外,通过协作,MASystems可以具有超强的动力,这种能力可以通过对他人的不付费,或通过用于在协作者之间分享信息的通信系统实现。在这里,我们开发了一个分布式MA学习机制,能够通过分散部分可见的Markov决定程序(Dec-POMDPs)和图形神经网络(GNNSs)进行沟通。培训机器学习模型所消耗的时间和能量的微型化,同时可以通过合作MA机制提高绩效。在现实世界情景下,一个离岸风场,包括一组分散的风力涡轮机,目标是实现集体效率的最大化。与SA系统相比,MA合作显示,在看不见和规模的情景中,培训时间和累积的回报显著减少。