学习社会学习 (Learning Social Learning)

Social learning is a key component of human and animal intelligence. By taking cues from the behavior of experts in their environment, social learners can acquire sophisticated behavior and rapidly adapt to new circumstances. This paper investigates whether independent reinforcement learning (RL) agents in a multi-agent environment can use social learning to improve their performance using cues from other agents. We find that in most circumstances, vanilla model-free RL agents do not use social learning, even in environments in which individual exploration is expensive. We analyze the reasons for this deficiency, and show that by introducing a model-based auxiliary loss we are able to train agents to lever-age cues from experts to solve hard exploration tasks. The generalized social learning policy learned by these agents allows them to not only outperform the experts with which they trained, but also achieve better zero-shot transfer performance than solo learners when deployed to novel environments with experts. In contrast, agents that have not learned to rely on social learning generalize poorly and do not succeed in the transfer task. Further,we find that by mixing multi-agent and solo training, we can obtain agents that use social learning to out-perform agents trained alone, even when experts are not avail-able. This demonstrates that social learning has helped improve agents' representation of the task itself. Our results indicate that social learning can enable RL agents to not only improve performance on the task at hand, but improve generalization to novel environments.

翻译：社会学习是人类和动物智慧的一个关键组成部分。通过从专家在环境中的行为中汲取教训,社会学习者可以获得尖端行为,并迅速适应新的环境。本文调查多试剂环境中的独立强化学习(RL)代理商是否可以利用其他代理商的提示利用社会学习来提高他们的绩效。我们发现,在多数情况下,无香草模型的RL代理商并不利用社会学习,即使在个人探索费用昂贵的环境中也是如此。我们分析了这种缺陷的原因,并表明通过引入基于模型的辅助损失,我们有能力培训代理商从专家那里获得杠杆式的提示,以解决困难的勘探任务。这些代理商所学的普遍社会学习政策不仅使他们在培训的专家中表现优于他们所培训的专家,而且比与专家一起被部署到新环境时的个体学习者取得更好的零光效果。相比之下,那些没有学会依赖社会学习的代理商在转移任务中普遍不成功。此外,我们发现,通过混合多试剂和单项培训,我们就能获得代理商的杠杆式提示。这些代理商所学的普遍学习的代理商不能单靠社会学习成果。