We introduce a human-compatible reinforcement-learning approach to a cooperative game, making use of a third-party hand-coded human-compatible bot to generate initial training data and to perform initial evaluation. Our learning approach consists of imitation learning, search, and policy iteration. Our trained agents achieve a new state-of-the-art for bridge bidding in three settings: an agent playing in partnership with a copy of itself; an agent partnering a pre-existing bot; and an agent partnering a human player.
翻译:在合作游戏中,我们采用人与人兼容的强化学习方法,利用第三方手工编码的人与人兼容的机器人生成初步培训数据并进行初步评估。我们的学习方法包括模仿学习、搜索和政策循环。 我们受过训练的代理人在三种情况下实现了新的桥接投标最新技术:一个与自己副本合作的代理人;一个与先前存在的机器人合作的代理人;以及一个与人玩家合作的代理人。