Text-based games provide an interactive way to study natural language processing. While deep reinforcement learning has shown effectiveness in developing the game playing agent, the low sample efficiency and the large action space remain to be the two major challenges that hinder the DRL from being applied in the real world. In this paper, we address the challenges by introducing world-perceiving modules, which automatically decompose tasks and prune actions by answering questions about the environment. We then propose a two-phase training framework to decouple language learning from reinforcement learning, which further improves the sample efficiency. The experimental results show that the proposed method significantly improves the performance and sample efficiency. Besides, it shows robustness against compound error and limited pre-training data.
翻译:以文字为基础的游戏是研究自然语言处理的一种互动方式。虽然深层强化学习在开发游戏工具方面显示出了实效,但低抽样效率和大行动空间仍然是阻碍DRL在现实世界应用的两大挑战。在本文中,我们通过引入世界感知模块来应对挑战,这些模块通过回答环境问题自动分解任务和行动。然后我们提出一个两阶段培训框架,将语言学习与强化学习脱钩,从而进一步提高样本效率。实验结果显示,拟议方法极大地改进了绩效和样本效率。此外,它显示了抵御复合错误和有限培训前数据的力度。