Including memory banks in a natural language processing architecture increases model capacity by equipping it with additional data at inference time. In this paper, we build upon $k$NN-LM \citep{khandelwal20generalization}, which uses a pre-trained language model together with an exhaustive $k$NN search through the training data (memory bank) to achieve state-of-the-art results. We investigate whether we can improve the $k$NN-LM performance by instead training a LM with the knowledge that we will be using a $k$NN post-hoc. We achieved significant improvement using our method on language modeling tasks on \texttt{WIKI-2} and \texttt{WIKI-103}. The main phenomenon that we encounter is that adding a simple L2 regularization on the activations (not weights) of the model, a transformer, improves the post-hoc $k$NN classification performance. We explore some possible reasons for this improvement. In particular, we find that the added L2 regularization seems to improve the performance for high-frequency words without deteriorating the performance for low frequency ones.
翻译:将记忆库纳入自然语言处理架构, 使记忆库在推论时间配备额外数据, 从而增加模型能力。 在本文中, 我们以 $k$NN- LM \ citep{khandelwal20generization} 为基础, 使用预先培训的语言模型, 并通过培训数据( 模拟银行) 进行详尽的 $k$NNN 搜索, 以取得最新结果。 我们调查我们是否能够通过培训一个知道我们将会使用$k$NNN 后热量的LM 来提高LM 性能, 从而提高它的能力。 我们发现, 增加的L2 正规化似乎可以提高高频语言的性能, 而不会使低频的性能恶化 。