D3PI: 数据驱动分布式政策转换,用于同基因互连系统 (D3PI: Data-Driven Distributed Policy Iteration for Homogeneous Interconnected Systems)

Control of large-scale networked systems often necessitates the availability of complex models for the interactions amongst the agents. While building accurate models of these interactions could become prohibitive in many applications, data-driven control methods can circumvent model complexities by directly synthesizing a controller from the observed data. In this paper, we propose the Data-Driven Distributed Policy Iteration (D3PI) algorithm to design a feedback mechanism for a potentially large system that enjoys an underlying graph structure characterizing communications among the agents. Rather than having access to system parameters, our algorithm requires temporary "auxiliary" links to boost information exchange of a small portion of the graph during the learning phase. Therein, the costs are partitioned for learning and non-learning agents in order to ensure consistent control of the entire network. After the termination of the learning process, a distributed policy is proposed for the entire networked system by leveraging estimated components obtained in the learning phase. We provide extensive stability and convergence guarantees of the proposed distributed controller throughout the learning phase by exploiting the structure of the system parameters that occur due to the graph topology and existence of the temporary links. The practicality of our method is then illustrated with a simulation.

翻译：大规模网络化系统的控制往往要求为代理商之间的相互作用建立复杂的模型。虽然建立这些相互作用的准确模型在许多应用中可能变得令人望而却步,但数据驱动的控制方法可以通过直接合成观测数据的控制器而绕过模型的复杂性。在本文件中,我们提出数据驱动分布式政策循环算法,以便为一个具有代理商之间通信特征的基本图形结构的潜在大型系统设计一个反馈机制。我们的算法需要临时的“辅助”链接,以促进学习阶段一小部分的信息交流。其中,学习和非学习代理商的费用被分割,以确保整个网络的一致控制。在学习过程结束后,我们提议对整个网络化系统采用分布式政策,利用学习阶段获得的估计组件。我们通过利用由于图表表层学和临时链接的存在而出现的系统参数结构,在整个学习阶段为拟议的分布式控制器提供了广泛的稳定性和趋同保证。我们的方法的实用性先用模拟加以说明。