Transfer learning in Reinforcement Learning (RL) has been widely studied to overcome training issues of Deep-RL, i.e., exploration cost, data availability and convergence time, by introducing a way to enhance training phase with external knowledge. Generally, knowledge is transferred from expert-agents to novices. While this fixes the issue for a novice agent, a good understanding of the task on expert agent is required for such transfer to be effective. As an alternative, in this paper we propose Expert-Free Online Transfer Learning (EF-OnTL), an algorithm that enables expert-free real-time dynamic transfer learning in multi-agent system. No dedicated expert exists, and transfer source agent and knowledge to be transferred are dynamically selected at each transfer step based on agents' performance and uncertainty. To improve uncertainty estimation, we also propose State Action Reward Next-State Random Network Distillation (sars-RND), an extension of RND that estimates uncertainty from RL agent-environment interaction. We demonstrate EF-OnTL effectiveness against a no-transfer scenario and advice-based baselines, with and without expert agents, in three benchmark tasks: Cart-Pole, a grid-based Multi-Team Predator-Prey (mt-pp) and Half Field Offense (HFO). Our results show that EF-OnTL achieve overall comparable performance when compared against advice-based baselines while not requiring any external input nor threshold tuning. EF-OnTL outperforms no-transfer with an improvement related to the complexity of the task addressed.
翻译:加强学习中的转让学习(RL)已被广泛研究,以克服深RL的培训问题,即探索成本、数据提供和融合时间,方法是引入一种用外部知识加强培训阶段的方法;一般而言,知识从专家代理向新手转移;虽然这样可以解决新代理商的问题,但需要很好地理解专家代理商的任务,才能使这种转让有效;作为替代办法,我们提议专家在线在线转让学习(EF-ONTL),这一算法使多试剂系统中的专家免费实时动态转让学习成为可能;没有专门的专家,根据代理商的绩效和不确定性,在每一个转让阶段都动态地选择转让源代理商和知识;为了改进不确定性估计,我们还提议国家行动向上下国家随机网络蒸馏(Sars-RND),扩大RND,估计RL代理商-环境互动的不确定性。 我们用EF-ONL(EF-O-O)相对于不以专家代理商为基础的转移假设和基于咨询基准,在三个基准阶段(Car-LFO-FO)中,以不以专家代理商和不要求的外部基准(RO-FT-I-FT-FT-C-C-FS-C-C-S-S-C-C-C-S-FOR-C-FOR-C-S-C-S-C-C-C-C-C-FDFDFS-S-S-C-C-C-C-S-S-S-S-S-S-C-T-S-S-S-S-C-S-S-C-T-C-C-C-C-C-SDF-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-C-S-S-S-S-S-S-S-S-S-S-S-SD-SD-SDFDFD-SD-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-</s>