The ability of an AI agent to assist other agents, such as humans, is an important and challenging goal, which requires the assisting agent to reason about the behavior and infer the goals of the assisted agent. Training such an ability by using reinforcement learning usually requires large amounts of online training, which is difficult and costly. On the other hand, offline data about the behavior of the assisted agent might be available, but is non-trivial to take advantage of by methods such as offline reinforcement learning. We introduce methods where the capability to create a representation of the behavior is first pre-trained with offline data, after which only a small amount of interaction data is needed to learn an assisting policy. We test the setting in a gridworld where the helper agent has the capability to manipulate the environment of the assisted artificial agents, and introduce three different scenarios where the assistance considerably improves the performance of the assisted agents.
翻译:AI代理商协助其他代理商(如人类)的能力是一项重要而具有挑战性的目标,它要求协助代理商了解受援助代理商的行为并推断出其目标。通过强化学习来培训这种能力通常需要大量的在线培训,这既困难又昂贵。另一方面,关于受援助代理商行为的离线数据可能存在,但利用诸如离线强化学习等方法是非三角的。我们采用的方法是,建立行为代表的能力首先经过离线数据的预先培训,此后只需要少量的互动数据来学习协助政策。我们测试在网路上的情景,因为辅助代理商有能力操纵受援助的人工代理商的环境,并引入三种不同的情景,即援助大大改善了受援助代理商的绩效。