We present an approach to reduce the communication required between agents in a Multi-Agent learning system by exploiting the inherent robustness of the underlying Markov Decision Process. We compute so-called robustness surrogate functions (off-line), that give agents a conservative indication of how far their state measurements can deviate before they need to update other agents in the system. This results in fully distributed decision functions, enabling agents to decide when it is necessary to update others. We derive bounds on the optimality of the resulting systems in terms of the discounted sum of rewards obtained, and show these bounds are a function of the design parameters. Additionally, we extend the results for the case where the robustness surrogate functions are learned from data, and present experimental results demonstrating a significant reduction in communication events between agents.
翻译:我们提出了一个减少多机构学习系统中代理商之间所需沟通的方法,方法是利用马克夫基本决策程序的内在稳健性;我们计算所谓的稳健性代理功能(脱线),使代理商保守地表明在需要更新系统中其他代理商之前其状态的测量可以偏离的程度;这导致充分分配决策功能,使代理商能够决定何时需要更新其他代理商;我们从所获得奖励的折扣金额中推断出由此形成的系统的最佳性,并表明这些界限是设计参数的函数;此外,我们扩大从数据中学习稳健性代理功能的情况的结果,并提出实验结果,表明代理商之间的通信事件大幅减少。