Multilingual models are often particularly dependent on scaling to generalize to a growing number of languages. Compression techniques are widely relied upon to reconcile the growth in model size with real world resource constraints, but compression can have a disparate effect on model performance for low-resource languages. It is thus crucial to understand the trade-offs between scale, multilingualism, and compression. In this work, we propose an experimental framework to characterize the impact of sparsifying multilingual pre-trained language models during fine-tuning. Applying this framework to mBERT named entity recognition models across 40 languages, we find that compression confers several intriguing and previously unknown generalization properties. In contrast to prior findings, we find that compression may improve model robustness over dense models. We additionally observe that under certain sparsification regimes compression may aid, rather than disproportionately impact the performance of low-resource languages.
翻译:多语种模式往往特别取决于扩大规模以推广到越来越多的语言。压缩技术被广泛用来调和模型规模的增长与实际世界资源限制,但压缩可能对低资源语言的模型性能产生不同的影响。因此,理解规模、多语种和压缩之间的权衡关系至关重要。在这项工作中,我们提出了一个实验框架,以说明微调过程中将多语种预先培训的语言模式加以扩大的影响。将这个框架应用到40种语言的 mBERT 命名实体识别模型中,我们发现压缩具有几种有趣的和以前未知的通用性能。我们发现,与先前的研究结果不同,压缩可能会改善模型的稳健性,而不是密集模式。我们还注意到,在某些加压制度下,压缩可能帮助而不是不成比例地影响低资源语言的性能。