Pretrained multilingual language models have become a common tool in transferring NLP capabilities to low-resource languages, often with adaptations. In this work, we study the performance, extensibility, and interaction of two such adaptations: vocabulary augmentation and script transliteration. Our evaluations on part-of-speech tagging, universal dependency parsing, and named entity recognition in nine diverse low-resource languages uphold the viability of these approaches while raising new questions around how to optimally adapt multilingual models to low-resource settings.
翻译:训练有素的多语种模式已成为将非语言语言语言能力转移至低资源语言的共同工具,而且经常会进行调整。 在这项工作中,我们研究了两种这类适应的性能、可扩展性和互动性:词汇扩增和文字转写。 我们对部分语音标记、普遍依赖性分割和以九种不同低资源语言命名的实体承认的评估维护了这些方法的可行性,同时提出了关于如何优化多语言模式适应低资源环境的新问题。