The ability of pretrained Transformers to remember factual knowledge is essential for knowledge-intense downstream tasks such as closed-book question answering. Existing work has shown that pretrained Transformers can recall or leverage factual knowledge that appears in the pretraining corpus to some degree. However, due to the limit of the model capacity, the ability of pretrained models to remember factual knowledge is also limited. Dai et al. (2022) find that the Feed-Forward Networks (FFNs) in pretrained Transformers store factual knowledge in a memory-like manner. Inspired by this finding, we propose a Neural Knowledge Bank (NKB) to store extra factual knowledge for pretrained Transformers. To be specific, we also regard FFNs as key-value memories, and extend them with additional memory slots. During knowledge injection, we fix the original model and inject factual knowledge into the extended memory slots, so there will be no catastrophic forgetting for the pretrained model. In addition, the view of FFNs as key-value memories makes the NKB highly interpretable. We use three closed-book question answering datasets to show our strong ability to store extra factual knowledge. Also, we prove that the NKB will not degrade the general language generation ability of pretrained models through two representative generation tasks, summarization and machine translation. Further, we thoroughly analyze the NKB to reveal its working mechanism and present the meaning of its keys and values in a human-readable way. On top of it, we perform a preliminary attempt to directly update the factual knowledge in the NKB without any additional training.
翻译:培训前的变换人记忆真实性知识的能力对于知识密集的下游任务,例如封闭式书答题等,至关重要。现有工作表明,培训前的变换人可以在某种程度上回忆或利用在培训前材料中出现的事实性知识。然而,由于模型能力有限,培训前的模型记忆事实性知识的能力也有限。Dai等人(2022年)发现,培训前的变换人的Feed-Forward 网络(FFNs)以类似记忆的方式存储事实性知识。根据这一发现,我们提议建立一个神经知识库(NKB),为事先培训的变换人直接储存额外的事实性知识。具体地说,我们还将FFFNF视为关键值的记忆性记忆记忆性记忆记忆,并将这些知识扩展成更多的记忆。在知识注入知识时,我们修复原始模型,并将事实性知识注入扩展到扩展后的记忆性知识。此外,FFFFMRM作为关键值的记忆,使NKB具有高度可解释性。我们用三个解式数据库来直接储存更多的事实性知识。我们通过两个变现变现的变现的模型,还要证明我们通过模拟变现的变现的模型,要进一步的变现的变现的变现,要用在复制的模型中,我们用一个变现的变现的变现的变现的变现的变现的变现的变现的变式的变式的变式的变式技术。