Pretrained, large, generative language models (LMs) have had great success in a wide range of sequence tagging and structured prediction tasks. Casting a sequence tagging task as a Seq2Seq one requires deciding the formats of the input and output sequences. However, we lack a principled understanding of the trade-offs associated with these formats (such as the effect on model accuracy, sequence length, multilingual generalization, hallucination). In this paper, we rigorously study different formats one could use for casting input text sentences and their output labels into the input and target (i.e., output) of a Seq2Seq model. Along the way, we introduce a new format, which we show to to be both simpler and more effective. Additionally the new format demonstrates significant gains in the multilingual settings -- both zero-shot transfer learning and joint training. Lastly, we find that the new format is more robust and almost completely devoid of hallucination -- an issue we find common in existing formats. With well over a 1000 experiments studying 14 different formats, over 7 diverse public benchmarks -- including 3 multilingual datasets spanning 7 languages -- we believe our findings provide a strong empirical basis in understanding how we should tackle sequence tagging tasks.
翻译:在一系列广泛的序列标记和结构化的预测任务中,先入为主的、大型的、具有基因特征的语言模型(LMS)取得了巨大成功。作为Seq2Seq1, 将序列标记任务作为Seq2Seq1, 需要决定输入和输出序列的格式。然而,我们对这些格式的权衡缺乏原则性理解(例如,对模型精确度、序列长度、多语种概括、幻觉的影响等)。在本文中,我们严格地研究一种不同的格式,一种可以用来将输入文本句及其输出标签输入Seq2Seq模型的投入和目标(即产出)。在前进的道路上,我们引入一种新的格式,我们表明这种格式既简单又有效。此外,新格式展示了多语种环境中的重大收益 -- -- 零光转学和联合培训。最后,我们发现,新格式更加稳健,几乎完全没有幻觉 -- -- 我们在现有格式中发现一个共同的问题。超过1000个实验研究了14种不同格式,超过7种不同的公共基准 -- -- 包括3种跨7种多语言的多语种多语种数据集 -- -- -- -- 我们相信我们的调查结果提供了坚实的经验基础,我们理解了我们如何解决的序列任务。