The paper considers independent reinforcement learning (IRL) for multi-agent decision-making process in the paradigm of federated learning (FL). We show that FL can clearly improve the policy performance of IRL in terms of training efficiency and stability. However, since the policy parameters are trained locally and aggregated iteratively through a central server in FL, frequent information exchange incurs a large amount of communication overheads. To reach a good balance between improving the model's convergence performance and reducing the required communication and computation overheads, this paper proposes a system utility function and develops a consensus-based optimization scheme on top of the periodic averaging method, which introduces the consensus algorithm into FL for the exchange of a model's local gradients. This paper also provides novel convergence guarantees for the developed method, and demonstrates its superior effectiveness and efficiency in improving the system utility value through theoretical analyses and numerical simulation results.
翻译:该文件考虑了在联合学习范式(FL)中多试剂决策过程的独立强化学习(IRL)问题。我们表明,FL可以明确提高IRL在培训效率和稳定性方面的政策绩效。然而,由于政策参数是在当地培训的,并通过FL的中央服务器进行迭接,经常的信息交流产生了大量的通信间接费用。为了在改进模型的趋同性能和减少所需的通信和计算间接费用之间取得良好的平衡,本文件提议了一个系统实用功能,并在定期平均法之外发展一个基于共识的优化计划,将共识算法引入FL,用于交换模型的本地梯度。该文件还为开发的方法提供了新的趋同性保证,并通过理论分析和数字模拟结果,在提高系统实用价值方面表现出了更高的效力和效率。