Ad hoc teamwork problem describes situations where an agent has to cooperate with previously unseen agents to achieve a common goal. For an agent to be successful in these scenarios, it has to have a suitable cooperative skill. One could implement cooperative skills into an agent by using domain knowledge to design the agent's behavior. However, in complex domains, domain knowledge might not be available. Therefore, it is worthwhile to explore how to directly learn cooperative skills from data. In this work, we apply meta-reinforcement learning (meta-RL) formulation in the context of the ad hoc teamwork problem. Our empirical results show that such a method could produce robust cooperative agents in two cooperative environments with different cooperative circumstances: social compliance and language interpretation. (This is a full paper of the extended abstract version.)
翻译:特设团队合作问题描述了代理人必须同先前看不见的代理人合作以实现共同目标的情况。代理人要在这些情况下取得成功,就必须具备适当的合作技能。可以将合作技能运用于代理人,利用领域知识设计代理人的行为。然而,在复杂的领域,可能无法提供领域知识。因此,值得探讨如何直接从数据中学习合作技能。在这项工作中,我们在特别的团队合作问题中应用元强化学习(meta-RL)的提法。我们的实证结果表明,这种方法可以在具有不同合作环境的两个合作环境中产生强有力的合作代理人:社会合规和语言解释。 (这是扩大的抽象版本的完整文件。)