Most recent coreference resolution systems use search algorithms over possible spans to identify mentions and resolve coreference. We instead present a coreference resolution system that uses a text-to-text (seq2seq) paradigm to predict mentions and links jointly. We implement the coreference system as a transition system and use multilingual T5 as an underlying language model. We obtain state-of-the-art accuracy on the CoNLL-2012 datasets with 83.3 F1-score for English (a 2.3 higher F1-score than previous work (Dobrovolskii, 2021)) using only CoNLL data for training, 68.5 F1-score for Arabic (+4.1 higher than previous work) and 74.3 F1-score for Chinese (+5.3). In addition we use the SemEval-2010 data sets for experiments in the zero-shot setting, a few-shot setting, and supervised setting using all available training data. We get substantially higher zero-shot F1-scores for 3 out of 4 languages than previous approaches and significantly exceed previous supervised state-of-the-art results for all five tested languages.
翻译:最近的共同参考解析系统在可能范围内使用搜索算法来识别和解决提及和连接问题。我们提出一个共同解析系统,使用文本到文本(seq2seq)的范式来共同预测和链接。我们把共同参照系统作为一个过渡系统,并以多语种T5作为基本语言模型。我们获得了CONLL-2012数据集的最新准确性,该数据集有83.3 F1核心的英文(比以前的工作(Dobrovolskii, 2021)高2.3 F1核心),仅使用CoNLLL数据进行培训,68.5 F1核心阿拉伯文(比以前的工作高4.1)和74.3 F1核心中文(+5.3),此外,我们使用SemEval-2010数据集进行零发式试验,用几发式设置,并监督使用所有现有培训数据进行设置。我们从4种语言中的3种获得高得多的零发F1核心,大大超过以前监督的5种语言。