Reinforcement learning (RL) agents are particularly hard to train when rewards are sparse. One common solution is to use intrinsic rewards to encourage agents to explore their environment. However, recent intrinsic exploration methods often use state-based novelty measures which reward low-level exploration and may not scale to domains requiring more abstract skills. Instead, we explore natural language as a general medium for highlighting relevant abstractions in an environment. Unlike previous work, we evaluate whether language can improve over existing exploration methods by directly extending (and comparing to) competitive intrinsic exploration baselines: AMIGo (Campero et al., 2021) and NovelD (Zhang et al., 2021). These language-based variants outperform their non-linguistic forms by 47-85% across 13 challenging tasks from the MiniGrid and MiniHack environment suites.
翻译:在奖励少的时候,强化学习(RL)代理人特别难于培训。一个共同的解决办法是利用内在奖励鼓励代理人探索其环境。然而,最近的内在勘探方法往往使用基于国家的新颖措施,奖励低层次的勘探,而可能不推广到需要更多抽象技能的领域。相反,我们探索自然语言,作为在环境中突出相关抽象因素的一般媒介。与以往的工作不同,我们评估语言是否能够通过直接扩展(和比较)竞争性的内在勘探基线来改善现有勘探方法:AMIGo(Campero等人,2021年)和NovelD(Zhang等人,2021年)。这些基于语言的变体在MiniGrid和MiniHack环境套件的13项具有挑战性的任务中比其非语言形式高出47-85%。