We introduce a method for improving the structural understanding abilities of language models. Unlike previous approaches that finetune the models with task-specific augmentation, we pretrain language models on a collection of task-agnostic corpora to generate structures from text. Our structure pretraining enables zero-shot transfer of the learned knowledge that models have about the structure tasks. We study the performance of this approach on 28 datasets, spanning 10 structure prediction tasks including open information extraction, joint entity and relation extraction, named entity recognition, relation classification, semantic role labeling, event extraction, coreference resolution, factual probe, intent detection, and dialogue state tracking. We further enhance the pretraining with the task-specific training sets. We show that a 10B parameter language model transfers non-trivially to most tasks and obtains state-of-the-art performance on 21 of 28 datasets that we evaluate.
翻译:我们采用了一种提高语言模型结构理解能力的方法。与以往用特定任务扩增来微化模型的方法不同的是,我们先对收集任务不可知的社团模型进行预演,以便从文本中产生结构结构。我们的结构预演使模型对结构任务所学知识的零光传输成为可能。我们研究了这一方法在28个数据集上的绩效,涉及10个结构预测任务,包括公开信息提取、联合实体和关联提取、名称实体识别、关系分类、语义作用标签、事件提取、参照分辨率、事实探测、意图探测和对话状态跟踪。我们进一步加强了任务特定培训组合的预培训。我们显示,10B参数语言模型将大多数任务转移至非边际,并在我们评估的28个数据集中的21个取得最新业绩。</s>