Language Models are the underpin of all modern Natural Language Processing (NLP) tasks. The introduction of the Transformers architecture has contributed significantly into making Language Modeling very effective across many NLP task, leading to significant advancements in the field. However, Transformers come with a big computational cost, which grows quadratically with respect to the input length. This presents a challenge as to understand long texts requires a lot of context. In this paper, we propose a Fine-Tuning framework, named CoreLM, that extends the architecture of current Pretrained Language Models so that they incorporate explicit entity information. By introducing entity representations, we make available information outside the contextual space of the model, which results in a better Language Model for a fraction of the computational cost. We implement our approach using GPT2 and compare the fine-tuned model to the original. Our proposed model achieves a lower Perplexity in GUMBY and LAMBDADA datasets when compared to GPT2 and a fine-tuned version of GPT2 without any changes. We also compare the models' performance in terms of Accuracy in LAMBADA and Children's Book Test, with and without the use of model-created coreference annotations.
翻译:语言模型是所有现代自然语言处理(NLP)任务的基础。 引入变换器结构极大地促进了使语言建模在许多国家语言处理(NLP)任务中非常有效的语言建模,导致在实地取得显著进展。 然而,变换器的计算成本很大,其计算成本随着输入长度的四进制而增长。这在理解长文本方面是一个挑战。 在本文件中,我们提议了一个称为CorreLM的微调访问框架,以扩展目前预先培训的语言模型的结构,使其包含明确的实体信息。我们通过介绍实体,在模型背景空间之外提供信息,从而形成一个更好的计算成本部分的语文模型。我们采用GPT2的方法,并将微调模型与原始模型进行比较。我们提议的模型在GUMBY和LAMBDDADAD数据集中,与GPT2相比,在不作任何改动的情况下,可以降低难度。 我们还将模型在LAMBAADA和Chiles的模型和共同索引中使用的测试。