We investigate the use of natural language to drive the generalization of policies in multi-agent settings. Unlike single-agent settings, the generalization of policies should also consider the influence of other agents. Besides, with the increasing number of entities in multi-agent settings, more agent-entity interactions are needed for language grounding, and the enormous search space could impede the learning process. Moreover, given a simple general instruction,e.g., beating all enemies, agents are required to decompose it into multiple subgoals and figure out the right one to focus on. Inspired by previous work, we try to address these issues at the entity level and propose a novel framework for language grounding in multi-agent reinforcement learning, entity divider (EnDi). EnDi enables agents to independently learn subgoal division at the entity level and act in the environment based on the associated entities. The subgoal division is regularized by opponent modeling to avoid subgoal conflicts and promote coordinated strategies. Empirically, EnDi demonstrates the strong generalization ability to unseen games with new dynamics and expresses the superiority over existing methods.
翻译:我们调查自然语言的使用情况,以促使多试剂环境的政策普遍化。与单一试剂环境不同,政策的普及还应考虑其他代理人的影响。此外,由于多试剂环境中的实体越来越多,语言定位需要更多的代理实体互动,而巨大的搜索空间会阻碍学习过程。此外,鉴于简单的一般性指示,例如殴打所有敌人,需要代理人将其分解成多个次级目标,并找出值得关注的正确目标。在以往工作的启发下,我们试图在实体一级解决这些问题,并为多试剂强化学习、实体分化(Endi)中的语言定位提出一个新的框架。Endi使代理人能够在实体一级独立学习次级目标分工,并在相关实体的环境下采取行动。次级目标划分由反对者建模规范,以避免次级目标冲突,促进协调战略。具有活力的Endi展示了以新的动态进行无形游戏的强大普及能力,并展示了现有方法的优越性。