We introduce Lookup-Table Language Models (LookupLM), a method for scaling up the size of RNN language models with only a constant increase in the floating point operations, by increasing the expressivity of the embedding table. In particular, we instantiate an (additional) embedding table which embeds the previous n-gram token sequence, rather than a single token. This allows the embedding table to be scaled up arbitrarily -- with a commensurate increase in performance -- without changing the token vocabulary. Since embeddings are sparsely retrieved from the table via a lookup; increasing the size of the table adds neither extra operations to each forward pass nor extra parameters that need to be stored on limited GPU/TPU memory. We explore scaling n-gram embedding tables up to nearly a billion parameters. When trained on a 3-billion sentence corpus, we find that LookupLM improves long tail log perplexity by 2.44 and long tail WER by 23.4% on a downstream speech recognition task over a standard RNN language model baseline, an improvement comparable to a scaling up the baseline by 6.2x the number of floating point operations.
翻译:我们引入了查看表语言模型( Lookup- Table 语言模型( Luge- Table 语言模型), 这是一种通过增加嵌入表的表达性, 使浮动点操作持续增加来提升 RNN 语言模型规模的方法。 特别是, 我们立即将嵌入表( 附加的) 嵌入表嵌入到先前的 n 克象征性序列中, 而不是一个符号 。 这样可以任意扩大嵌入表 -- -- 其性能相应增加 -- -- 而不改变符号词汇。 由于嵌入表通过浏览从表格中稀疏取; 增加表格的大小既不会给每个前方通道增加额外的操作, 也不会增加需要存储在有限的 GPU/ TPU 记忆中的额外参数。 我们探索将 n 克嵌入表缩放到近10亿 参数 。 在对30 个句体进行训练时, 我们发现 LookupLM 将长尾边线的翻增2.44 和长尾尾巴WER 23. 4%, 在标准 RNN 语言模型基线的下游语音识别任务中, 将改进与基线提升为6.2x 。