Translation has played a crucial role in improving the performance on multilingual tasks: (1) to generate the target language data from the source language data for training and (2) to generate the source language data from the target language data for inference. However, prior works have not considered the use of both translations simultaneously. This paper shows that combining them can synergize the results on various multilingual sentence classification tasks. We empirically find that translation artifacts stylized by translators are the main factor of the performance gain. Based on this analysis, we adopt two training methods, SupCon and MixUp, considering translation artifacts. Furthermore, we propose a cross-lingual fine-tuning algorithm called MUSC, which uses SupCon and MixUp jointly and improves the performance. Our code is available at https://github.com/jongwooko/MUSC.
翻译:翻译在改进多语种任务的业绩方面发挥了至关重要的作用:(1) 从培训的源语言数据中产生目标语言数据,(2) 从目标语言数据中产生源语言数据,以进行推论;然而,以前的工作没有同时考虑同时使用这两种译文;本文件表明,将它们结合起来可以使多种多语种判决分类工作的结果产生协同效应;我们从经验中发现,笔译员的翻译工艺品结晶化是取得绩效的主要因素;根据这一分析,我们采取了两种培训方法,即SupCon和MixUp,考虑翻译文物;此外,我们建议采用一种跨语言的微调算法,称为MUSC,使用SupC和MixUp,共同使用并改进业绩。我们的代码可在https://github.com/jongwooko/MUSC查阅。