一般价值函数网络 (General Value Function Networks)

State construction is important for learning in partially observable environments. A general purpose strategy for state construction is to learn the state update using a Recurrent Neural Network (RNN), which updates the internal state using the current internal state and the most recent observation. This internal state provides a summary of the observed sequence, to facilitate accurate predictions and decision-making. At the same time, specifying and training RNNs is notoriously tricky, particularly as the common strategy to approximate gradients back in time, called truncated Back-prop Through Time (BPTT), can be sensitive to the truncation window. Further, domain-expertise--which can usually help constrain the function class and so improve trainability--can be difficult to incorporate into complex recurrent units used within RNNs. In this work, we explore how to use multi-step predictions to constrain the RNN and incorporate prior knowledge. In particular, we revisit the idea of using predictions to construct state and ask: does constraining (parts of) the state to consist of predictions about the future improve RNN trainability? We formulate a novel RNN architecture, called a General Value Function Network (GVFN), where each internal state component corresponds to a prediction about the future represented as a value function. We first provide an objective for optimizing GVFNs, and derive several algorithms to optimize this objective. We then show that GVFNs are more robust to the truncation level, in many cases only requiring one-step gradient updates.

翻译：国家建设对于部分可见环境中的学习很重要。国家建设的一般目的战略是使用常规神经网络(NNNN)来学习国家更新。常规神经网络(RNNN)通常能用当前内部状态和最新观测来帮助限制功能类别,从而改进培训能力,从而难以纳入到RNN内部使用的复杂的经常性单元中。在这项工作中,我们探索如何使用多步预测来限制RNN并纳入先前的知识。特别是,我们重新审视使用预测来建立状态的想法,并询问: 使用预测来回推近梯度的共同战略,称为快速回溯回路程(BBTTT),可以敏感地了解轨迹窗口。此外,域技术(BPTT)通常能帮助限制功能类别,改进内部状态,从而改善培训能力。我们设计了一个新的 RNNNC 结构, 称为通用功能网络, 以显示我们未来目标的某个部分。