The transformer has been shown to outperform recurrent neural network-based sequence-to-sequence models in various word-level NLP tasks. Yet for character-level transduction tasks, e.g. morphological inflection generation and historical text normalization, there are few works that outperform recurrent models using the transformer. In an empirical study, we uncover that, in contrast to recurrent sequence-to-sequence models, the batch size plays a crucial role in the performance of the transformer on character-level tasks, and we show that with a large enough batch size, the transformer does indeed outperform recurrent models. We also introduce a simple technique to handle feature-guided character-level transduction that further improves performance. With these insights, we achieve state-of-the-art performance on morphological inflection and historical text normalization. We also show that the transformer outperforms a strong baseline on two other character-level transduction tasks: grapheme-to-phoneme conversion and transliteration.
翻译:变压器在各种单词级 NLP 任务中表现优于经常性神经网络序列到序列模型。 然而,在字符级转换任务中,例如形态渗透生成和历史文本正常化方面,几乎没有什么作品能够比变压器的经常模型更优异。在一项经验研究中,我们发现,与经常序列到序列模型相比,批量大小在字符级任务变压器的运行中发挥着关键作用,而且我们显示,由于批量大小足够大,变压器确实优于经常模型。我们还引入了一种简单技术,处理地貌引导字符级转换,从而进一步提高性能。有了这些洞察,我们就能在形态变化和历史文本正常化方面实现最先进的性能。我们还表明,变压器在另外两个特性级转换任务上,即图形式对电话转换和转换,比重基线要强。