The cooperation among AI systems, and between AI systems and humans is becoming increasingly important. In various real-world tasks, an agent needs to cooperate with unknown partner agent types. This requires the agent to assess the behaviour of the partner agent during a cooperative task and to adjust its own policy to support the cooperation. Deep reinforcement learning models can be trained to deliver the required functionality but are known to suffer from sample inefficiency and slow learning. However, adapting to a partner agent behaviour during the ongoing task requires ability to assess the partner agent type quickly. We suggest a method, where we synthetically produce populations of agents with different behavioural patterns together with ground truth data of their behaviour, and use this data for training a meta-learner. We additionally suggest an agent architecture, which can efficiently use the generated data and gain the meta-learning capability. When an agent is equipped with such a meta-learner, it is capable of quickly adapting to cooperation with unknown partner agent types in new situations. This method can be used to automatically form a task distribution for meta-training from emerging behaviours that arise, for example, through self-play.
翻译:AI系统之间以及AI系统与人类之间的合作正变得越来越重要。在各种现实世界任务中,代理人需要与未知的伙伴代理人类型合作。这要求代理人在合作任务中评估伙伴代理人的行为,并调整自己的政策以支持合作。深度强化学习模式可以培训,以提供所需的功能,但已知存在效率低下和学习缓慢的样本。然而,在现行任务中适应伙伴代理人的行为要求有能力快速评估伙伴代理人类型。我们建议一种方法,即我们合成地生成不同行为模式的代理人群体,同时提供其行为的地面真相数据,并利用这些数据培训元孤单者。我们还建议一种代理结构,能够有效利用生成的数据并获得元学习能力。当一个代理人配备了这种元分解器时,它能够迅速适应与新情况下未知的伙伴代理人类型的合作。这种方法可以自动形成任务分配,从新出现的行为中进行元培训,例如通过自我作用。