A core issue in federated reinforcement learning is defining how to aggregate insights from multiple agents into one. This is commonly done by taking the average of each participating agent's model weights into one common model (FedAvg). We instead propose FedFormer, a novel federation strategy that utilizes Transformer Attention to contextually aggregate embeddings from models originating from different learner agents. In so doing, we attentively weigh contributions of other agents with respect to the current agent's environment and learned relationships, thus providing more effective and efficient federation. We evaluate our methods on the Meta-World environment and find that our approach yields significant improvements over FedAvg and non-federated Soft Actor Critique single agent methods. Our results compared to Soft Actor Critique show that FedFormer performs better while still abiding by the privacy constraints of federated learning. In addition, we demonstrate nearly linear improvements in effectiveness with increased agent pools in certain tasks. This is contrasted by FedAvg, which fails to make noticeable improvements when scaled.
翻译:联盟强化学习的一个核心问题是确定如何将多个代理商的洞见汇总为一个问题。 通常的做法是将每个参与代理商的模型权重平均数纳入一个共同模型( FedAvg ) 。 我们提议FedFormer, 这是一种新型联邦战略,利用变异器关注来自不同学习代理商的模型在环境背景上的综合嵌入。 这样,我们仔细权衡其他代理商对当前代理商的环境和所学关系的贡献,从而提供更有成效和效率更高的联合会。 我们评估了我们在元世界环境中的方法,发现我们的方法在FedAvg和非Federed Soft Actor Critique单一代理法方法上取得了显著的改进。 我们与SoftActor Critique相比的结果显示,FedFormer公司在遵守来自不同学习代理商的隐私限制的同时表现更好。 此外,我们还在某些任务中,与增加代理商的集合相比,我们展示了几乎线性改进。