The paper considers a distributed version of deep reinforcement learning (DRL) for multi-agent decision-making process in the paradigm of federated learning. Since the deep neural network models in federated learning are trained locally and aggregated iteratively through a central server, frequent information exchange incurs a large amount of communication overheads. Besides, due to the heterogeneity of agents, Markov state transition trajectories from different agents are usually unsynchronized within the same time interval, which will further influence the convergence bound of the aggregated deep neural network models. Therefore, it is of vital importance to reasonably evaluate the effectiveness of different optimization methods. Accordingly, this paper proposes a utility function to consider the balance between reducing communication overheads and improving convergence performance. Meanwhile, this paper develops two new optimization methods on top of variation-aware periodic averaging methods: 1) the decay-based method which gradually decreases the weight of the model's local gradients within the progress of local updating, and 2) the consensus-based method which introduces the consensus algorithm into federated learning for the exchange of the model's local gradients. This paper also provides novel convergence guarantees for both developed methods and demonstrates their effectiveness and efficiency through theoretical analysis and numerical simulation results.
翻译:本文考虑了在联邦学习范式中为多试剂决策程序提供深度强化学习(DRL)的分布版;由于联邦学习的深神经网络模型通过中央服务器进行本地培训并迭接,经常的信息交流产生大量通信间接费用;此外,由于物剂的异质性,不同物剂的马可夫州过渡轨迹通常在同一时间间隔内不同步,这将进一步影响综合深度神经网络模型的趋同界限;因此,合理评估不同优化方法的有效性至关重要;因此,本文件提出一种实用功能,以考虑减少通信间接费用与提高趋同性能之间的平衡;同时,本文件还就变化-认识周期平均方法的顶端制定了两种新的优化方法:(1) 基于衰变法的方法,在地方更新进展中逐步降低模型本地梯度的权重;(2)基于共识的方法,将共识算法引入用于交换模型本地梯度的联结学习;本文件还提出了一种实用功能,以考虑如何平衡地减少通信间接费用与提高趋同性;同时,本文件还提出了新的理论性趋同性,通过分析,展示了各种方法,并展示了数字效率。