CooT：基于协调Transformer的上下文协调学习 (CooT: Learning to Coordinate In-Context with Coordination Transformers)

Effective coordination among artificial agents in dynamic and uncertain environments remains a significant challenge in multi-agent systems. Existing approaches, such as self-play and population-based methods, either generalize poorly to unseen partners or require impractically extensive fine-tuning. To overcome these limitations, we propose Coordination Transformers (\coot), a novel in-context coordination framework that uses recent interaction histories to rapidly adapt to unseen partners. Unlike prior approaches that primarily aim to diversify training partners, \coot explicitly focuses on adapting to new partner behaviors by predicting actions aligned with observed interactions. Trained on trajectories collected from diverse pairs of agents with complementary preferences, \coot quickly learns effective coordination strategies without explicit supervision or parameter updates. Across diverse coordination tasks in Overcooked, \coot consistently outperforms baselines including population-based approaches, gradient-based fine-tuning, and a Meta-RL-inspired contextual adaptation method. Notably, fine-tuning proves unstable and ineffective, while Meta-RL struggles to achieve reliable coordination. By contrast, \coot achieves stable, rapid in-context adaptation and is consistently ranked the most effective collaborator in human evaluations.

翻译：在动态和不确定环境中实现人工智能体间的有效协调，仍然是多智能体系统面临的重要挑战。现有方法（如自我博弈和基于种群的方法）要么对未见过的协作伙伴泛化能力差，要么需要大量不切实际的微调。为克服这些局限性，我们提出协调Transformer（CooT），这是一种新颖的上下文协调框架，利用最近的交互历史快速适应未见过的协作伙伴。与以往主要致力于多样化训练伙伴的方法不同，CooT通过预测与观察到的交互行为相一致的动作，明确聚焦于适应新伙伴的行为。该模型使用从具有互补偏好的多样化智能体对中收集的轨迹进行训练，无需显式监督或参数更新即可快速学习有效的协调策略。在Overcooked的多样化协调任务中，CooT始终优于包括基于种群的方法、基于梯度的微调以及受元强化学习启发的上下文适应方法在内的基线模型。值得注意的是，微调方法被证明不稳定且效果有限，而元强化学习方法难以实现可靠的协调。相比之下，CooT实现了稳定、快速的上下文适应，在人类评估中始终被评为最有效的协作伙伴。