Named Entity Recognition (NER) and Relation Extraction (RE) are the core sub-tasks for information extraction. Many recent works formulate these two tasks as the span (pair) classification problem, and thus focus on investigating how to obtain a better span representation from the pre-trained encoder. However, a major limitation of existing works is that they ignore the dependencies between spans (pairs). In this work, we propose a novel span representation approach, named Packed Levitated Markers, to consider the dependencies between the spans (pairs) by strategically packing the markers in the encoder. In particular, we propose a group packing strategy to enable our model to process massive spans together to consider their dependencies with limited resources. Furthermore, for those more complicated span pair classification tasks, we design a subject-oriented packing strategy, which packs each subject and all its objects into an instance to model the dependencies between the same-subject span pairs. Our experiments show that our model with packed levitated markers outperforms the sequence labeling model by 0.4%-1.9% F1 on three flat NER tasks, beats the token concat model on six NER benchmarks, and obtains a 3.5%-3.6% strict relation F1 improvement with higher speed over previous SOTA models on ACE04 and ACE05. Code and models are publicly available at https://github.com/thunlp/PL-Marker.
翻译:命名实体识别( NER) 和 关系提取( RE) 是信息提取的核心子任务 。 许多最近的工程将这两项任务作为跨( pair) 分类问题来制定。 许多最近的工程将这两项任务作为跨( pair) 分类问题来制定。 因此, 重点是调查如何从培训前的编码器中获得更好的跨度代表。 然而, 现有工程的一个主要限制是, 它们忽略了跨( pairs) 之间的依赖性。 在这项工作中, 我们建议采用新的跨度代表法, 名为包装脱节标记, 以战略方式在编码器中包装标记标记。 特别是, 我们提出一个分组包装战略, 使我们的模型能够用有限的资源一起处理大跨度, 来考虑它们之间的依赖性。 此外, 对于这些更复杂的跨对等分类任务, 我们设计了一个面向主题的包装战略, 将每个主题及其所有对象都包装成一个实例, 以模拟同一主题区域模型之间的依赖性。 我们的实验显示, 包装的悬浮标记比值比序列模型比标比标值为0.4%- 1- 1% 和三个固定的固定 AS AS ASASASASASAS 。 在前 AS AS AS AS AS AS ASB ASB ASBSAR 1 上, 在 AS AS AS ASBSLBSL AS 1 AS 1 AS 上以 AS AS 1 AS AS AS 1 AS 1 AS AS AS AS AS b AS b 3 。