To solve difficult tasks, humans ask questions to acquire knowledge from external sources. In contrast, classical reinforcement learning agents lack such an ability and often resort to exploratory behavior. This is exacerbated as few present-day environments support querying for knowledge. In order to study how agents can be taught to query external knowledge via language, we first introduce two new environments: the grid-world-based Q-BabyAI and the text-based Q-TextWorld. In addition to physical interactions, an agent can query an external knowledge source specialized for these environments to gather information. Second, we propose the "Asking for Knowledge" (AFK) agent, which learns to generate language commands to query for meaningful knowledge that helps solve the tasks. AFK leverages a non-parametric memory, a pointer mechanism and an episodic exploration bonus to tackle (1) irrelevant information, (2) a large query language space, (3) delayed reward for making meaningful queries. Extensive experiments demonstrate that the AFK agent outperforms recent baselines on the challenging Q-BabyAI and Q-TextWorld environments.
翻译:解决困难的任务, 人类会询问从外部获取知识的问题。 相反, 古典强化学习剂缺乏这种能力, 并经常采用探索行为。 这一点会加剧, 因为当今的环境很少支持知识查询。 为了研究如何教代理人通过语言查询外部知识, 我们首先引入两个新环境: 基于网络的 Q-BabyAI 和基于文本的 Q-TextWorld 。 除了物理互动外, 代理人可以询问一个外部知识来源, 专门为这些环境收集信息。 其次, 我们提议“ 追求知识” 代理( Asking for ledge), 以生成语言命令来查询有助于解决问题的有意义的知识。 AFK 利用非参数记忆、 指针机制和 派生勘探奖金来解决 (1) 无关的信息 、 (2) 大型查询语言空间 、 (3) 有意义查询的奖励延迟 。 广泛的实验显示, AFK 代理超越了挑战的 Q-BabyAI 和 Q-TextWorld 环境的最新基线 。