A common approach to prediction and planning in partially observable domains is to use recurrent neural networks (RNNs), which ideally develop and maintain a latent memory about hidden, task-relevant factors. We hypothesize that many of these hidden factors in the physical world are constant over time, changing only sparsely. To study this hypothesis, we propose Gated $L_0$ Regularized Dynamics (GateL0RD), a novel recurrent architecture that incorporates the inductive bias to maintain stable, sparsely changing latent states. The bias is implemented by means of a novel internal gating function and a penalty on the $L_0$ norm of latent state changes. We demonstrate that GateL0RD can compete with or outperform state-of-the-art RNNs in a variety of partially observable prediction and control tasks. GateL0RD tends to encode the underlying generative factors of the environment, ignores spurious temporal dependencies, and generalizes better, improving sampling efficiency and overall performance in model-based planning and reinforcement learning tasks. Moreover, we show that the developing latent states can be easily interpreted, which is a step towards better explainability in RNNs.
翻译:在部分可观测的领域中,预测和规划的一个共同办法是使用经常性神经网络(RNN),这种网络最好能开发并保持隐藏的、与任务相关的因素的潜在记忆。我们假设物理世界中许多这些隐藏因素随时间变化而保持不变,变化很少。为了研究这一假设,我们提议采用Gated $_0$的正规化动态(GateLORD),这是一个新的经常性结构,它包含感知偏差,以维持稳定、微小变化的潜伏状态。这种偏差是通过一种新的内部定位功能和对潜在状态变化的0.0美元标准的惩罚来实施的。我们证明GateL0RD可以与部分观察的预测和控制任务进行竞争,或超过该技术的状态。GateL0RD倾向于将环境的基本组合因素编码,忽视了虚假的时间依赖性,并且将模型规划和强化学习任务的总体性更好化,提高取样效率和总体绩效。此外,我们表明,发展中的潜伏状态可以很容易被解释,这是在区域网内更好地解释。