Ad hoc teamwork (AHT) is the problem of creating an agent that must collaborate with previously unseen teammates without prior coordination. Many existing AHT methods can be categorised as type-based methods, which require a set of predefined teammates for training. Designing teammate types for training is a challenging issue that determines the generalisation performance of agents when dealing with teammate types unseen during training. In this work, we propose a method to discover diverse teammate types based on maximising best response diversity metrics. We show that our proposed approach yields teammate types that require a wider range of best responses from the learner during collaboration, which potentially improves the robustness of a learner's performance in AHT compared to alternative methods.
翻译:特设团队合作(AHT)是创建代理人的问题,该代理人必须与先前不为人知的队友合作而无需事先协调。许多现有的AHT方法可以归类为基于类型的方法,这需要一组预先确定的培训队友。设计队友培训类型是一个具有挑战性的问题,它决定了代理人在与培训期间不为人知的队友打交道时的一般表现。在这项工作中,我们建议一种方法,根据最佳反应多样性衡量标准最大化来发现不同的队友类型。我们表明,我们拟议的方法产生团队队友类型,需要从协作期间的学习者那里得到更广泛的最佳反应,这有可能提高学习者在AHT的成绩相对于替代方法的稳健性。