Recently, very large pre-trained models achieve state-of-the-art results in various natural language processing (NLP) tasks, but their size makes it more challenging to apply them in resource-constrained environments. Compression techniques allow to drastically reduce the size of the model and therefore its inference time with negligible impact on top-tier metrics. However, the general performance hides a drastic performance drop on under-represented features, which could result in the amplification of biases encoded by the model. In this work, we analyze the impacts of compression methods on Multilingual Neural Machine Translation models (MNMT) for various language groups and semantic features by extensive analysis of compressed models on different NMT benchmarks, e.g. FLORES-101, MT-Gender, and DiBiMT. Our experiments show that the performance of under-represented languages drops significantly, while the average BLEU metric slightly decreases. Interestingly, the removal of noisy memorization with the compression leads to a significant improvement for some medium-resource languages. Finally, we demonstrate that the compression amplifies intrinsic gender and semantic biases, even in high-resource languages.
翻译:最近,许多经过培训的模型在各种自然语言处理(NLP)任务中取得了最先进的成果,但其规模使得在资源限制的环境中应用这些模型更具挑战性。压缩技术可以大幅缩小模型的大小,因此其推论时间对顶级指标的影响微不足道。然而,总体表现掩盖了代表性不足特点的性能急剧下降,这可能导致模型编码中的偏见的扩大。在这项工作中,我们通过对压缩方法对不同语言组和语义特征的多语言神经机器翻译模型(MNMT)的影响进行广泛分析,根据不同的NMT基准对压缩模型进行广泛分析,例如FLORES-101、MT-Gender和DiBIMT。我们的实验表明,代表性不足语言的性能显著下降,而平均BLEU的度度则略有下降。有趣的是,通过压缩消除噪音和压缩导致某些中等资源语言的重大改进。最后,我们证明压缩的性别和语义偏见甚至高资源语言的内在性和语义。