Recently, very large pre-trained models achieve state-of-the-art results in various natural language processing (NLP) tasks, but their size makes it more challenging to apply them in resource-constrained environments. Compression techniques allow to drastically reduce the size of the models and therefore their inference time with negligible impact on top-tier metrics. However, the general performance averaged across multiple tasks and/or languages may hide a drastic performance drop on under-represented features, which could result in the amplification of biases encoded by the models. In this work, we assess the impact of compression methods on Multilingual Neural Machine Translation models (MNMT) for various language groups, gender, and semantic biases by extensive analysis of compressed models on different machine translation benchmarks, i.e. FLORES-101, MT-Gender, and DiBiMT. We show that the performance of under-represented languages drops significantly, while the average BLEU metric only slightly decreases. Interestingly, the removal of noisy memorization with compression leads to a significant improvement for some medium-resource languages. Finally, we demonstrate that compression amplifies intrinsic gender and semantic biases, even in high-resource languages. Code: https://github.com/alirezamshi/bias-compressedMT
翻译:最近,许多经过培训的模型在各种自然语言处理(NLP)任务中取得了最先进的成果,但其规模使得在资源限制的环境中应用这些模型更具挑战性。压缩技术可以大幅缩小模型的规模,从而缩短其推论时间,对顶级指标的影响微不足道。然而,在多种任务和/或语言中,总体表现平均水平可能掩盖代表性不足特点的急剧下降,这可能导致扩大由模型编码的偏见。在这项工作中,我们评估压缩方法对多种语言、性别和语系语言群体多语言神经机器翻译模型的影响。通过广泛分析不同机器翻译基准的压缩模型,例如FLORES-101、MT-Gender和DBIMT。我们表明,代表性不足语言的绩效显著下降,而平均BLEU衡量标准仅略微下降。有趣的是,消除压缩的杂音记忆化导致某些中等资源语言的大幅改进。最后,我们展示了对高语言的压缩模型的内在性别和磁性分析。我们展示了强化了MTA/Qremaimal/Simpressional。