A core issue in multi-agent federated reinforcement learning is defining how to aggregate insights from multiple agents. This is commonly done by taking the average of each participating agent's model weights into one common model (FedAvg). We instead propose FedFormer, a novel federation strategy that utilizes Transformer Attention to contextually aggregate embeddings from models originating from different learner agents. In so doing, we attentively weigh the contributions of other agents with respect to the current agent's environment and learned relationships, thus providing a more effective and efficient federation. We evaluate our methods on the Meta-World environment and find that our approach yields significant improvements over FedAvg and non-federated Soft Actor-Critic single-agent methods. Our results compared to Soft Actor-Critic show that FedFormer achieves higher episodic return while still abiding by the privacy constraints of federated learning. Finally, we also demonstrate improvements in effectiveness with increased agent pools across all methods in certain tasks. This is contrasted by FedAvg, which fails to make noticeable improvements when scaled.
翻译:多试剂联合强化学习的一个核心问题是确定如何从多个代理商中收集洞察力。 通常的做法是将每个参与代理商的模型加权平均值纳入一个共同模型( FedAvg ) 。 我们提议FedFormer, 这是一种新型联邦战略,利用变换器关注来自不同学习代理商的模型上的环境综合嵌入。 这样,我们仔细权衡其他代理商对当前代理商环境和学习关系的贡献,从而提供一个更有成效和效率更高的联合会。 我们评估了我们在Meta- World环境中的方法,发现我们的方法比FedAvg 和非FedAft Actor-Critic 单一代理商方法有了显著的改进。 我们与SoftActor-Critic 相比的结果显示,FedFormer公司在继续遵守进化学习的隐私限制的同时,实现了更高的附带回报率。 最后,我们还证明在某些任务中,在增加代理商库方面的效力有所提高。 与FedAvg相比, 后者在缩缩时没有显著的改进。</s>