Existing language models (LMs) predict tokens with a softmax over a finite vocabulary, which can make it difficult to predict rare tokens or phrases. We introduce NPM, the first nonparametric masked language model that replaces this softmax with a nonparametric distribution over every phrase in a reference corpus. We show that NPM can be efficiently trained with a contrastive objective and an in-batch approximation to full corpus retrieval. Zero-shot evaluation on 9 closed-set tasks and 7 open-set tasks demonstrates that NPM outperforms significantly larger parametric models, either with or without a retrieve-and-generate approach. It is particularly better on dealing with rare patterns (word senses or facts), and predicting rare or nearly unseen words (e.g., non-Latin script). We release the model and code at github.com/facebookresearch/NPM.
翻译:现有语言模型(LMS)预测象征物,用软体符号代替有限的词汇,这可能使得难以预测稀有的象征物或短语。我们引入了国家预防机制,这是第一个非对称蒙面语言模型,用非对称分布方式取代了参考材料中每个词组中的软面符号。我们表明,国家预防机制可以有效地培训,以对比目标和全体检索的批量近似值为目的。对9个封闭式任务和7个开放式任务进行零弹式评价,表明国家预防机制的参数模型大大超过较大的模型,无论是否采用检索和generate方法。特别是处理稀有模式(词感或事实),以及预测稀有或几乎看不见的词组(例如非拉丁文字)。我们在 Github.com/facebookresearch/NPM发布模型和代码。