One of the fundamental quests of AI is to produce agents that coordinate well with humans. This problem is challenging, especially in domains that lack high quality human behavioral data, because multi-agent reinforcement learning (RL) often converges to different equilibria from the ones that humans prefer. We propose a novel framework, instructRL, that enables humans to specify what kind of strategies they expect from their AI partners through natural language instructions. We use pretrained large language models to generate a prior policy conditioned on the human instruction and use the prior to regularize the RL objective. This leads to the RL agent converging to equilibria that are aligned with human preferences. We show that instructRL converges to human-like policies that satisfy the given instructions in a proof-of-concept environment as well as the challenging Hanabi benchmark. Finally, we show that knowing the language instruction significantly boosts human-AI coordination performance in human evaluations in Hanabi.
翻译:人工智能核心问题之一在于如何产生与人类协同得好的智能体。这是一个具有挑战性的问题,特别是在缺乏高质量人类行为数据的领域中,因为多智能体强化学习常常会收敛到不同于人类所偏好的均衡点。我们提出了一种新颖的框架,称为instructRL,它通过自然语言指导使人类能够指定他们期望AI合作伙伴具有的策略类型。我们使用预训练的大型语言模型来生成一个在人类指令下的先验策略,并使用该先验策略来规范强化学习目标。这导致强化学习智能体收敛到与人类偏好一致的均衡点。我们证明了instructRL在概念环境和具有挑战性的Hanabi基准测试中均收敛于符合给定指示的类似于人类的策略。最后,我们证明,在Hanabi的人类评估中,知道语言指令显着提高了人工智能-人类协作表现。