In a cooperative multiagent system, a collection of agents executes a joint policy in order to achieve some common objective. The successful deployment of such systems hinges on the availability of reliable inter-agent communication. However, many sources of potential disruption to communication exist in practice, such as radio interference, hardware failure, and adversarial attacks. In this work, we develop joint policies for cooperative multiagent systems that are robust to potential losses in communication. More specifically, we develop joint policies for cooperative Markov games with reach-avoid objectives. First, we propose an algorithm for the decentralized execution of joint policies during periods of communication loss. Next, we use the total correlation of the state-action process induced by a joint policy as a measure of the intrinsic dependencies between the agents. We then use this measure to lower-bound the performance of a joint policy when communication is lost. Finally, we present an algorithm that maximizes a proxy to this lower bound in order to synthesize minimum-dependency joint policies that are robust to communication loss. Numerical experiments show that the proposed minimum-dependency policies require minimal coordination between the agents while incurring little to no loss in performance; the total correlation value of the synthesized policy is one fifth of the total correlation value of the baseline policy which does not take potential communication losses into account. As a result, the performance of the minimum-dependency policies remains consistently high regardless of whether or not communication is available. By contrast, the performance of the baseline policy decreases by twenty percent when communication is lost.
翻译:在一个合作型多试剂系统中,一批代理人执行联合政策,以实现一些共同目标。成功部署这种系统取决于能否获得可靠的机构间通信。然而,在实践中存在着许多潜在的通信中断来源,例如无线电干扰、硬件故障和对抗性攻击。在这项工作中,我们为合作型多试剂系统制定了联合政策,这些合作型多试剂系统对潜在的通信损失十分有力。更具体地说,我们为合作型马可夫游戏制定了联合政策,其目标达到避免目标。首先,我们提出了在通信损失期间分散执行联合政策的方法。其次是,我们利用联合政策引起的国家行动进程的总体相关性,以衡量代理人之间的内在依赖性。然后,我们利用这一措施,在通信损失时,降低联合政策的业绩。最后,我们提出一种最大限度的代用方法,以综合对通信损失最强的最低限度联合政策。 数量实验表明,拟议的最低依赖性政策要求代理人之间最小程度的协调,同时不至于联合政策的完全损失,同时,在政策性业绩方面,一个基本价值是综合了现有政策的价值。