The Book of the Dean of Lismore (BDL) is a 16th-century Scottish Gaelic manuscript written in a non-standard orthography. In this work, we outline the problem of transliterating the text of the BDL into a standardised orthography, and perform exploratory experiments using Transformer-based models for this task. In particular, we focus on the task of word-level transliteration, and achieve a character-level BLEU score of 54.15 with our best model, a BART architecture pre-trained on the text of Scottish Gaelic Wikipedia and then fine-tuned on around 2,000 word-level parallel examples. Our initial experiments give promising results, but we highlight the shortcomings of our model, and discuss directions for future work.
翻译:利斯莫尔院长(BDL)的书是16世纪苏格兰盖尔文手稿,以非标准拼写形式写成。在这项工作中,我们概述了将《苏格兰盖尔文》文本转换成标准化拼写法的问题,并运用基于变异器的模型对这项任务进行探索性实验。特别是,我们侧重于字级转写任务,并用我们最好的模型取得54.15的品格水平的BLEU分数,一个对苏格兰盖尔语维基百科文本进行预先培训的“BART”结构,然后对大约2,000个字级平行实例进行微调。我们的初步实验带来了令人乐观的结果,但我们强调了我们的模型的缺点,并讨论了未来工作的方向。