While notable progress has been made in specifying and learning objectives for general cyber-physical systems, applying these methods to distributed multi-agent systems still pose significant challenges. Among these are the need to (a) craft specification primitives that allow expression and interplay of both local and global objectives, (b) tame explosion in the state and action spaces to enable effective learning, and (c) minimize coordination frequency and the set of engaged participants for global objectives. To address these challenges, we propose a novel specification framework that allows natural composition of local and global objectives used to guide training of a multi-agent system. Our technique enables learning expressive policies that allow agents to operate in a coordination-free manner for local objectives, while using a decentralized communication protocol for enforcing global ones. Experimental results support our claim that sophisticated multi-agent distributed planning problems can be effectively realized using specification-guided learning.
翻译:虽然在确定一般网络物理系统的具体目标和学习目标方面取得了显著进展,但将这些方法应用于分布式多试剂系统仍构成重大挑战,其中包括需要:(a) 设计出允许地方和全球目标表达和相互作用的原始规格,(b) 州和行动空间的温室爆炸,以便有效学习,(c) 尽量减少协调频率和一套参与全球目标的参与者,为应对这些挑战,我们提议了一个新的规格框架,允许地方和全球目标的自然构成,用于指导多试剂系统的培训。我们的技术使得学习的表达政策能够使代理人以协调的方式为地方目标运作,同时使用分散的通信协议执行全球目标。实验结果支持我们的说法,即复杂的多剂分布式规划问题可以通过规范化的学习得到有效的实现。