Recent years have seen a paradigm shift in NLP towards using pretrained language models ({PLM}) for a wide range of tasks. However, there are many difficult design decisions to represent structures (e.g. tagged text, coreference chains) in a way such that they can be captured by PLMs. Prior work on structured prediction with PLMs typically flattens the structured output into a sequence, which limits the quality of structural information being learned and leads to inferior performance compared to classic discriminative models. In this work, we describe an approach to model structures as sequences of actions in an autoregressive manner with PLMs, allowing in-structure dependencies to be learned without any loss. Our approach achieves the new state-of-the-art on all the structured prediction tasks we looked at, namely, named entity recognition, end-to-end relation extraction, and coreference resolution.
翻译:近些年来,国家语言平台的范式发生了转变,转向对多种任务使用预先培训的语言模式({PLM}),然而,有许多难以做出的设计决定代表结构(例如贴标签的文本、共同参照链),使这些结构能够被PLMs所捕捉。 以往与PLMs进行的结构化预测工作通常将结构化产出平整成一个序列,这限制了正在学习的结构信息的质量,并导致与典型的歧视性模式相比性能低下。 在这项工作中,我们用一种方法将模型结构描述为与PLMs以自动递增的方式采取行动的顺序,允许在不亏损的情况下学习结构内依赖性。 我们的方法实现了我们所研究的所有结构化预测任务的新状态,即名称实体识别、端对端关系提取和共同参照分辨率。