Reinforcement Learning faces an important challenge in partial observable environments that has long-term dependencies. In order to learn in an ambiguous environment, an agent has to keep previous perceptions in a memory. Earlier memory based approaches use a fixed method to determine what to keep in the memory, which limits them to certain problems. In this study, we follow the idea of giving the control of the memory to the agent by allowing it to have memory-changing actions. This learning mechanism is supported by an intrinsic motivation to memorize rare observations that can help the agent to disambiguate its state in the environment. Our approach is experimented and analyzed on several partial observable tasks with long-term dependencies and compared with other memory based methods.
翻译:强化学习在具有长期依赖性的局部可观测环境中面临重大挑战。 为了在模糊的环境中学习, 代理人必须保持先前的记忆。 早期记忆方法使用固定的方法来确定在记忆中应保留什么, 从而将其局限于某些问题 。 在这项研究中, 我们遵循让代理人采取记忆变化行动来控制记忆的想法 。 这种学习机制得到一种内在动机的支持, 即对稀有的观测进行记忆化, 这有助于代理人在环境中掩饰其状态。 我们的方法是实验和分析一些具有长期依赖性的局部可观测任务, 并与其他基于记忆的方法相比较 。