Successful deployment of multi-agent reinforcement learning often requires agents to adapt their behaviour. In this work, we discuss the problem of teamwork adaptation in which a team of agents needs to adapt their policies to solve novel tasks with limited fine-tuning. Motivated by the intuition that agents need to be able to identify and distinguish tasks in order to adapt their behaviour to the current task, we propose to learn multi-agent task embeddings (MATE). These task embeddings are trained using an encoder-decoder architecture optimised for reconstruction of the transition and reward functions which uniquely identify tasks. We show that a team of agents is able to adapt to novel tasks when provided with task embeddings. We propose three MATE training paradigms: independent MATE, centralised MATE, and mixed MATE which vary in the information used for the task encoding. We show that the embeddings learned by MATE identify tasks and provide useful information which agents leverage during adaptation to novel tasks.
翻译:多试剂强化学习的成功部署往往要求代理机构调整其行为。在这项工作中,我们讨论了团队协作适应问题,其中一组代理机构需要调整其政策,以便以有限的微调解决新任务。我们基于一种直觉,即代理机构需要能够识别和区分任务,以便使其行为适应当前任务,我们建议学习多试剂任务嵌入(MATE) 。这些任务嵌入是使用为重建过渡和奖励职能所选的编码器-编码器结构来培训的。我们表明,当提供任务嵌入时,一组代理机构能够适应新任务。我们提出了三种MATE培训模式:独立的MATE、中央化的MATE和混合的MATE,这些模式在任务编码所使用的信息上各不相同。我们表明,由MATE所学的嵌入确定了任务,并提供有用的信息,供代理机构在适应新任务时加以利用。