Pretraining Neural Language Models (NLMs) over a large corpus involves chunking the text into training examples, which are contiguous text segments of sizes processable by the neural architecture. We highlight a bias introduced by this common practice: we prove that the pretrained NLM can model much stronger dependencies between text segments that appeared in the same training example, than it can between text segments that appeared in different training examples. This intuitive result has a twofold role. First, it formalizes the motivation behind a broad line of recent successful NLM training heuristics, proposed for the pretraining and fine-tuning stages, which do not necessarily appear related at first glance. Second, our result clearly indicates further improvements to be made in NLM pretraining for the benefit of Natural Language Understanding tasks. As an example, we propose "kNN-Pretraining": we show that including semantically related non-neighboring sentences in the same pretraining example yields improved sentence representations and open domain question answering abilities. This theoretically motivated degree of freedom for "pretraining example design" indicates new training schemes for self-improving representations.
翻译:对神经语言模型(NLM)进行大量培训前的神经语言模型(NLM)涉及将文本填入培训实例,这些培训实例是神经结构可以处理的大小的相毗的文字部分。我们强调这一共同做法带来的一种偏差:我们证明,培训前的NLM能够在同一培训实例中出现的文字部分之间形成更加强大的依赖性,而不是在不同的培训实例中出现的文字部分之间。这种直观的结果具有双重作用。首先,它正式确定了最近成功NLM培训培训超链接的广泛内容背后的动机,为培训前阶段和微调阶段提出的培训超链接,这些内容不一定在初看起来相关。第二,我们的结果清楚地表明,为了自然语言理解任务,在NLM培训前将作出进一步的改进。举例说,我们建议“kNNN-预备培训”:我们表明,在同一培训前示例中包含与语言有关的非近邻的句子能够改进句式和公开回答问题的能力。这种“预先培训示例设计”具有理论动机的自由程度,表明了自我改进的新的培训计划。