Pre-trained language models (PLMs) often take advantage of the monolingual and multilingual dataset that is freely available online to acquire general or mixed domain knowledge before deployment into specific tasks. Extra-large PLMs (xLPLMs) are proposed very recently to claim supreme performances over smaller-sized PLMs such as in machine translation (MT) tasks. These xLPLMs include Meta-AI's wmt21-dense-24-wide-en-X and NLLB. \textit{In this work, we examine if xLPLMs are absolutely superior to smaller-sized PLMs in fine-tuning toward domain-specific MTs.} We use two different in-domain data of different sizes: commercial automotive in-house data and \textbf{clinical} shared task data from the ClinSpEn2022 challenge at WMT2022. We choose popular Marian Helsinki as smaller sized PLM and two massive-sized Mega-Transformers from Meta-AI as xLPLMs. Our experimental investigation shows that 1) on smaller sized in-domain commercial automotive data, xLPLM wmt21-dense-24-wide-en-X indeed shows much better evaluation scores using S\textsc{acre}BLEU and hLEPOR metrics than smaller-sized Marian, even though its score increase rate is lower than Marian after fine-tuning; 2) on relatively larger-size well prepared clinical data fine-tuning, the xLPLM NLLB \textbf{tends to lose} its advantage over smaller-sized Marian on two sub-tasks (clinical terms and ontology concepts) using ClinSpEn offered metrics METEOR, COMET, and ROUGE-L, and totally lost to Marian on Task-1 (clinical cases) on all metrics including S\textsc{acre}BLEU and BLEU; 3) \textbf{metrics do not always agree} with each other on the same tasks using the same model outputs.
翻译:训练前语言模型( PLM ) 常常利用在线免费的单语和多语言数据集 { 通用或混合域知识 { 通用 { 通用 { 通用 { 在线 以获得通用或混合域域知识 { 。 我们最近提议使用超大型 PLM (xLPLM) 来要求超小的 PLM, 如机器翻译( MT) 任务 。 这些 xLPLM 包括 Met- AI 的 wmt21 - dent- 遍全域 和 NLLLB 。 我们选择流行的 Marian Hels 以小的 PLM 和两个大得多的MDM- translations 来表示, 我们的实验研究表明, 内部的精细数不是 MALS- tal- talizal, 而是使用更高级的 MILM 数据 。