Individual agents in a multi-agent system (MAS) may have decoupled open-loop dynamics, but a cooperative control objective usually results in coupled closed-loop dynamics thereby making the control design computationally expensive. The computation time becomes even higher when a learning strategy such as reinforcement learning (RL) needs to be applied to deal with the situation when the agents dynamics are not known. To resolve this problem, we propose a parallel RL scheme for a linear quadratic regulator (LQR) design in a continuous-time linear MAS. The idea is to exploit the structural properties of two graphs embedded in the $Q$ and $R$ weighting matrices in the LQR objective to define an orthogonal transformation that can convert the original LQR design to multiple decoupled smaller-sized LQR designs. We show that if the MAS is homogeneous then this decomposition retains closed-loop optimality. Conditions for decomposability, an algorithm for constructing the transformation matrix, a parallel RL algorithm, and robustness analysis when the design is applied to non-homogeneous MAS are presented. Simulations show that the proposed approach can guarantee significant speed-up in learning without any loss in the cumulative value of the LQR cost.
翻译:多试剂系统(MAS)中的个别物剂可能已经分解开开环状动态,但合作控制目标通常导致封闭环状动态的结合,从而使控制设计计算费用昂贵。当需要应用强化学习(RL)等学习战略来处理代理动态未知的情况时,计算时间就变得更高。为了解决这个问题,我们提议在连续时间线性MAS中为线性四管调节器设计平行的RL计划。设想是利用LQR目标中嵌入的两张图表的结构属性,使封闭环状图和超环加权表中的美元使控制设计成本昂贵。当将原LQR设计转换成多种分解的较小LQR设计时,计算时间就会变得更高。我们表明,如果MAS是均匀的,那么这种分解组合保留了闭环式最佳性。在连续时间性线性MAS设计中,构建变异矩阵的算法,平行RLL算法,以及当设计应用于LQRR目标中的非热性加权矩阵矩阵时,那么在累积性亏损率方法中,就能够保证拟议的模拟学习重大速度方法。