Social learning is a key component of human and animal intelligence. By taking cues from the behavior of experts in their environment, social learners can acquire sophisticated behavior and rapidly adapt to new circumstances. This paper investigates whether independent reinforcement learning (RL) agents in a multi-agent environment can learn to use social learning to improve their performance. We find that in most circumstances, vanilla model-free RL agents do not use social learning. We analyze the reasons for this deficiency, and show that by imposing constraints on the training environment and introducing a model-based auxiliary loss we are able to obtain generalized social learning policies which enable agents to: i) discover complex skills that are not learned from single-agent training, and ii) adapt online to novel environments by taking cues from experts present in the new environment. In contrast, agents trained with model-free RL or imitation learning generalize poorly and do not succeed in the transfer tasks. By mixing multi-agent and solo training, we can obtain agents that use social learning to gain skills that they can deploy when alone, even out-performing agents trained alone from the start.
翻译:社会学习是人类和动物智力的一个关键组成部分。社会学习者从专家在环境中的行为中汲取提示,可以获取尖端行为,并迅速适应新的环境。本文调查多试剂环境中的独立强化学习(RL)代理商是否可以学习利用社会学习来提高他们的绩效。我们发现,在大多数情况下,无香草模型的RL代理商并不利用社会学习。我们分析这种缺陷的原因,并表明通过对培训环境施加限制和引入基于模式的辅助损失,我们能够获得普遍的社会学习政策,使代理商能够:(i) 发现从单一代理培训中无法学到的复杂技能,以及(ii) 通过接受新环境中专家的提示,在网上适应新的环境。相比之下,接受无模式的RL或仿造培训的代理商没有很好地全面学习,在转让任务中不能成功。通过混合多试剂和单项培训,我们可以获得使用社会学习的代理商,以获得他们可以单独部署的技能,甚至从一开始就受过训练的超效代理商。