Span extraction, aiming to extract text spans (such as words or phrases) from plain texts, is a fundamental process in Information Extraction. Recent works introduce the label knowledge to enhance the text representation by formalizing the span extraction task into a question answering problem (QA Formalization), which achieves state-of-the-art performance. However, QA Formalization does not fully exploit the label knowledge and suffers from low efficiency in training/inference. To address those problems, we introduce a new paradigm to integrate label knowledge and further propose a novel model to explicitly and efficiently integrate label knowledge into text representations. Specifically, it encodes texts and label annotations independently and then integrates label knowledge into text representation with an elaborate-designed semantics fusion module. We conduct extensive experiments on three typical span extraction tasks: flat NER, nested NER, and event detection. The empirical results show that 1) our method achieves state-of-the-art performance on four benchmarks, and 2) reduces training time and inference time by 76% and 77% on average, respectively, compared with the QA Formalization paradigm. Our code and data are available at https://github.com/Akeepers/LEAR.
翻译:旨在从纯文本中提取文字(如文字或短语)的Span 抽取是《信息提取》的一个基本过程。最近的工作引入了标签知识,通过将抽取任务正规化成一个问题解答问题(QA正规化),提高文本的表述,从而实现最先进的性能。然而,QA正规化并没有充分利用标签知识,而且在培训/判断方面效率低下。为解决这些问题,我们引入了一种新的模式,整合标签知识,并进一步提出了一个新模式,以明确和有效地将标签知识纳入文本表述。具体地说,它将文本和标签说明独立编码,然后将标签知识与精心设计的语义融合模块整合成文本表述。我们在三种典型的抽取任务上进行了广泛的实验:平调 NER、嵌套式NER 和事件探测。经验结果显示,1 我们的方法在四个基准上达到最新性业绩,2 与QA正式化模式相比,平均将培训时间和推导时间分别减少76%和77%。我们的代码和数据可在 http://gires/babs.