Due to its perceptual limitations, an agent may have too little information about the state of the environment to act optimally. In such cases, it is important to keep track of the observation history to uncover hidden state. Recent deep reinforcement learning methods use recurrent neural networks (RNN) to memorize past observations. However, these models are expensive to train and have convergence difficulties, especially when dealing with high dimensional input spaces. In this paper, we propose influence-aware memory (IAM), a theoretically inspired memory architecture that tries to alleviate the training difficulties by restricting the input of the recurrent layers to those variables that influence the hidden state information. Moreover, as opposed to standard RNNs, in which every piece of information used for estimating Q values is inevitably fed back into the network for the next prediction, our model allows information to flow without being necessarily stored in the RNN's internal memory. Results indicate that, by letting the recurrent layers focus on a small fraction of the observation variables while processing the rest of the information with a feedforward neural network, we can outperform standard recurrent architectures both in training speed and policy performance. This approach also reduces runtime and obtains better scores than methods that stack multiple observations to remove partial observability.
翻译:由于其感知限制, 代理人可能对于环境状况的信息太少, 无法采取最佳行动。 在这种情况下, 跟踪观测历史以发现隐藏状态非常重要 。 最近深入强化的学习方法使用经常性神经网络( RNN) 来回忆过去观测。 然而, 这些模型对于训练来说费用昂贵, 并且有趋同困难, 特别是在处理高维输入空间时。 在本文中, 我们建议使用有影响力的记忆( IAM), 这是一种理论上启发的记忆结构, 试图通过将经常层输入限制在影响隐藏状态信息的变量上来缓解培训困难。 此外, 相对于标准的 RNNNs, 即用于估计Q值的每件信息不可避免地被反馈到网络中进行下一次预测, 我们的模式允许信息流动, 而不一定要存储在 RNN 的内部记忆中。 结果显示, 通过让经常层关注少量观测变量, 同时用反馈神经网络处理其余的信息, 我们可以在培训速度和政策性能中超越标准的经常结构。 这种方法也减少了多式的收缩率, 也减少了多式的收缩方法。