To effectively perform the task of next-word prediction, long short-term memory networks (LSTMs) must keep track of many types of information. Some information is directly related to the next word's identity, but some is more secondary (e.g. discourse-level features or features of downstream words). Correlates of secondary information appear in LSTM representations even though they are not part of an \emph{explicitly} supervised prediction task. In contrast, in reinforcement learning (RL), techniques that explicitly supervise representations to predict secondary information have been shown to be beneficial. Inspired by that success, we propose Predictive Representation Learning (PRL), which explicitly constrains LSTMs to encode specific predictions, like those that might need to be learned implicitly. We show that PRL 1) significantly improves two strong language modeling methods, 2) converges more quickly, and 3) performs better when data is limited. Our work shows that explicitly encoding a simple predictive task facilitates the search for a more effective language model.
翻译:为了有效执行下题预测的任务,长期的短期内存网络(LSTMs)必须跟踪许多类型的信息。有些信息与下一个词的身份直接相关,但有些信息是次要的(例如对话级别特征或下游词的特征)。二级信息的Correlates出现在LSTM的表述中,尽管它们不是受监督的预测任务的一部分。相反,在强化学习(RL)中,明确监督用于预测次级信息的表述的技术被证明是有益的。我们建议,在这种成功激励下,我们提议预测性代表性学习(PRL)明确限制LSTMs对具体预测进行编码,例如可能需要隐含地学习的那些预测。我们表明,PRL1 1 大大改进了两种强有力的语言模型方法,2) 在数据有限时更快地集中,3 运行得更好。我们的工作表明,明确将简单预测性的任务编码有助于寻找更有效的语言模型。