A common approach to prediction and planning in partially observable domains is to use recurrent neural networks (RNNs), which ideally develop and maintain a latent memory about hidden, task-relevant factors. We hypothesize that many of these hidden factors in the physical world are constant over time, changing only sparsely. Accordingly, we propose Gated $L_0$ Regularized Dynamics (GateL0RD), a novel recurrent architecture that incorporates the inductive bias to maintain stable, sparsely changing latent states. The bias is implemented by means of a novel internal gating function and a penalty on the $L_0$ norm of latent state changes. We demonstrate that GateL0RD can compete with or outperform state-of-the-art RNNs in a variety of partially observable prediction and control tasks. GateL0RD tends to encode the underlying generative factors of the environment, ignores spurious temporal dependencies, and generalizes better, improving sampling efficiency and prediction accuracy as well as behavior in model-based planning and reinforcement learning tasks. Moreover, we show that the developing latent states can be easily interpreted, which is a step towards better explainability in RNNs.
翻译:在部分可观测的领域,预测和规划的一个共同方法是使用经常性神经网络(RNN),这种网络最好能开发并保持隐藏的、与任务相关的因素的潜在记忆。我们假设物理世界中许多这些隐藏因素随时间变化而保持不变,变化很少。因此,我们提议Gated $L_0$ 正规化动态(GateLORD),这是一个新的经常性结构,它包含感知偏差,以维持稳定、稀有变化的潜伏状态。这种偏差是通过一种新的内部定位功能和对潜在状态变化的0.0美元规范的处罚来实施的。我们证明GateLORD可以与部分可观测的预测和控制任务竞争,或超过最新状态的RNNNS进行竞争。GateLOR倾向于将环境的基本归正因素编码,忽视虚假的时间依赖性,并更好地概括,提高取样效率和预测准确性,以及基于模型的规划和强化学习任务中的行为。此外,我们表明,发展中的潜值国家可以很容易被解释,这是在区域网内更好地解释。