Transformer language models (TLMs) are critical for most NLP tasks, but they are difficult to create for low-resource languages because of how much pretraining data they require. In this work, we investigate two techniques for training monolingual TLMs in a low-resource setting: greatly reducing TLM size, and complementing the masked language modeling objective with two linguistically rich supervised tasks (part-of-speech tagging and dependency parsing). Results from 7 diverse languages indicate that our model, MicroBERT, is able to produce marked improvements in downstream task evaluations relative to a typical monolingual TLM pretraining approach. Specifically, we find that monolingual MicroBERT models achieve gains of up to 18% for parser LAS and 11% for NER F1 compared to a multilingual baseline, mBERT, while having less than 1% of its parameter count. We conclude reducing TLM parameter count and using labeled data for pretraining low-resource TLMs can yield large quality benefits and in some cases produce models that outperform multilingual approaches.
翻译:变换语言模型(TLM)对于大部分国家语言平台任务至关重要,但由于培训前需要大量数据,很难为低资源语言创建低资源语言模式。 在这项工作中,我们调查了两种技术,用于在低资源环境下培训单语言的TLM:大大缩小TLM规模,并以两种语言丰富的监管任务(部分语音标记和依赖分析)来补充蒙面语言模型目标。 7种不同语言的结果表明,相对于典型的单语 TLM 预培训方法,我们的MicroBERT能够显著改进下游任务评估。 具体地说,我们发现,单语言的MicroBERT模型与多语言基线( mBERT)相比,取得了高达18%的收益,而NERF1的收益为11%,而其参数计数还不到1%。 我们得出结论,降低TLM参数的计数,并使用标签数据对低资源 TLMS进行预先培训,能够产生巨大的质量效益,在某些情况下,产生超出多语言方法的模型。