Reinforcement learning (RL) has been used in a range of simulated real-world tasks, e.g., sensor coordination, traffic light control, and on-demand mobility services. However, real world deployments are rare, as RL struggles with dynamic nature of real world environments, requiring time for learning a task and adapting to changes in the environment. Transfer Learning (TL) can help lower these adaptation times. In particular, there is a significant potential of applying TL in multi-agent RL systems, where multiple agents can share knowledge with each other, as well as with new agents that join the system. To obtain the most from inter-agent transfer, transfer roles (i.e., determining which agents act as sources and which as targets), as well as relevant transfer content parameters (e.g., transfer size) should be selected dynamically in each particular situation. As a first step towards fully dynamic transfers, in this paper we investigate the impact of TL transfer parameters with fixed source and target roles. Specifically, we label every agent-environment interaction with agent's epistemic confidence, and we filter the shared examples using varying threshold levels and sample sizes. We investigate impact of these parameters in two scenarios, a standard predator-prey RL benchmark and a simulation of a ride-sharing system with 200 vehicle agents and 10,000 ride-requests.
翻译:强化学习(RL)已被用于一系列模拟现实世界的任务,例如传感器协调、交通灯控控和随需流动服务,然而,实际世界部署很少,因为与现实世界环境的动态性质抗争,需要时间学习一项任务和适应环境的变化。转让学习(TL)有助于降低这些适应时间。特别是,在多试剂RL系统中应用TL有很大潜力,其中多个代理商可以相互分享知识,并与加入系统的新代理商分享知识。从机构间转移、转移作用(即确定哪些代理商作为来源和哪些作为目标)以及相关的转移内容参数(如转移规模)中获取最多,在每种特定情况下都应以动态方式选择。转移学习(TL)可以帮助降低这些适应时间。特别是,在本文中,我们调查TL传输参数对固定来源和目标作用的影响。具体地说,我们将每一次代理商-环境互动与代理商的直觉信任贴上标签,我们用不同的门槛和20,000个标准飞行器的模底影响,我们用不同的标准级和摄像率将共同示例过滤这些共同的例子。