In this work, our goal is to train agents that can coordinate with seen, unseen as well as human partners in a multi-agent communication environment involving natural language. Previous work using a single set of agents has shown great progress in generalizing to known partners, however it struggles when coordinating with unfamiliar agents. To mitigate that, recent work explored the use of population-based approaches, where multiple agents interact with each other with the goal of learning more generic protocols. These methods, while able to result in good coordination between unseen partners, still only achieve so in cases of simple languages, thus failing to adapt to human partners using natural language. We attribute this to the use of static populations and instead propose a dynamic population-based meta-learning approach that builds such a population in an iterative manner. We perform a holistic evaluation of our method on two different referential games, and show that our agents outperform all prior work when communicating with seen partners and humans. Furthermore, we analyze the natural language generation skills of our agents, where we find that our agents also outperform strong baselines. Finally, we test the robustness of our agents when communicating with out-of-population agents and carefully test the importance of each component of our method through ablation studies.
翻译:在这项工作中,我们的目标是培训能够在涉及自然语言的多媒介交流环境中与可见的、看不见的和人类伙伴进行协调的代理人; 以往使用一组单一代理人的工作在向已知的伙伴推广方面已经取得了很大进展,尽管在与不熟悉的代理人协调时,这种进展十分艰难; 为了减轻这一困难,最近的工作探索了使用基于人口的方法,这种方法使多种代理人相互交流,目的是学习更通用的规程; 这些方法虽然能够使看不见的伙伴之间实现良好的协调,但在简单的语言方面仍然能够取得良好协调,从而无法适应于使用自然语言的人类伙伴; 我们将此归因于静态人口,而是提出一种动态的基于人口的元学习方法,以迭接的方式建立这种人口; 我们对两种不同的优惠游戏的方法进行整体评价,表明我们的代理人在与所见的伙伴和人类进行交流时超越了以前的所有工作; 此外,我们分析我们的代理人的自然语言生成技能,在那里我们发现我们的代理人也超越了牢固的基线; 最后,我们检验我们的代理人在与人口代理人交流时是否稳健健健。