研究用于机器翻译的大型培训前语言模型:你不知道什么 (Examining Large Pre-Trained Language Models for Machine Translation: What You Don't Know About It) - 专知论文

会员服务 ·

0

语言模型化 · MoDELS · 得分 · 相同 · ROUGE ·

2022 年 10 月 25 日

Examining Large Pre-Trained Language Models for Machine Translation: What You Don't Know About It

翻译：研究用于机器翻译的大型培训前语言模型:你不知道什么

Lifeng Han,Gleb Erofeev,Irina Sorokina,Serge Gladkoff,Goran Nenadic

from arxiv, System paper Accepted to WMT2022: BiomedicalMT Track (ClinSpEn2022)

Pre-trained language models (PLMs) often take advantage of the monolingual and multilingual dataset that is freely available online to acquire general or mixed domain knowledge before deployment into specific tasks. Extra-large PLMs (xLPLMs) are proposed very recently to claim supreme performances over smaller-sized PLMs such as in machine translation (MT) tasks. These xLPLMs include Meta-AI's wmt21-dense-24-wide-en-X (2021) and NLLB (2022). In this work, we examine if xLPLMs are absolutely superior to smaller-sized PLMs in fine-tuning toward domain-specific MTs. We use two different in-domain data of different sizes: commercial automotive in-house data and clinical shared task data from the ClinSpEn2022 challenge at WMT2022. We choose popular Marian Helsinki as smaller sized PLM and two massive-sized Mega-Transformers from Meta-AI as xLPLMs. Our experimental investigation shows that 1) on smaller-sized in-domain commercial automotive data, xLPLM wmt21-dense-24-wide-en-X indeed shows much better evaluation scores using SacreBLEU and hLEPOR metrics than smaller-sized Marian, even though its score increase rate is lower than Marian after fine-tuning; 2) on relatively larger-size well prepared clinical data fine-tuning, the xLPLM NLLB tends to lose its advantage over smaller-sized Marian on two sub-tasks (clinical terms and ontology concepts) using ClinSpEn offered metrics METEOR, COMET, and ROUGE-L, and totally lost to Marian on Task-1 (clinical cases) on all official metrics including SacreBLEU and BLEU; 3) metrics do not always agree with each other on the same tasks using the same model outputs; 4) clinic-Marian ranked No.2 on Task- 1 (via SACREBLEU/BLEU) and Task-3 (via METEOR and ROUGE) among all submissions.

翻译：培训前语言模型(PLMS) 常常利用在线免费的单语和多语言数据集(PLM) 来获取通用或混合域知识。最近,我们提议使用超大型 PLMS(xLPLM) 来声称比机器翻译(MT) 等小型的PLM(MT) 任务拥有最高性能。这些xLPLMM(MetA-AI wmt21-dense-24-全X(2021) 和NLLLLB(2022) 。在这项工作中,我们检查xLPLMM(TLM) 是否绝对优于小型的PLMS(PM) 微小的SLMM(TLM) 微小的SLMLM(MLM) 和小的SMALM(MAL-NM(MA) 最小型的SDRML) 、更小型的SMALM(S-RML-NL) 和小的S-RV(MAL) 更小型的S- RDO- RDL- RDRV-RV-RV-RV- RD-RV- RV-RV-RV-RV-R-R-RV-R-R-RV-R-R-R-R-R-RV-S-S-RV-S-S-S-S-S-S-RV-S-S-S-S-S-S-S-RV-RV-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-

0

相关内容

语言模型化

语言模型化

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Biot固结方程的有限元方法及快速算法研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于Wyner-Ziv分布式编码的无线视频通信端到端失真度估算

国家自然科学基金

0+阅读 · 2014年12月31日

考虑观测值时空相关性的InSAR三维形变估计方法

国家自然科学基金

0+阅读 · 2013年12月31日

羊八井观测站大气不透明度的测量

国家自然科学基金

0+阅读 · 2013年12月31日

高阶Schwarz导数与Teichmuller空间紧化

国家自然科学基金

0+阅读 · 2012年12月31日

非线性椭圆型偏微分方程的边界正则性

国家自然科学基金

0+阅读 · 2012年12月31日

若干数学物理问题的谱方法和配置方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

Arc在颞叶癫痫发生及认知功能损害中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

遥感分类中的尺度效应机理与多尺度遥感数据分类方法研究

国家自然科学基金

0+阅读 · 2008年12月31日

Prompting Is Programming: A Query Language For Large Language Models

Arxiv

0+阅读 · 2022年12月12日

Increasing the Cost of Model Extraction with Calibrated Proof of Work

Arxiv

0+阅读 · 2022年12月12日

Exploring Discrete Diffusion Models for Image Captioning

Exploring Discrete Diffusion Models for Image Captioning

Arxiv

0+阅读 · 2022年12月9日

Variational Diffusion Models

Arxiv

0+阅读 · 2022年12月9日

Prompt Distribution Learning

Arxiv

14+阅读 · 2022年5月6日

Conditional Prompt Learning for Vision-Language Models

Conditional Prompt Learning for Vision-Language Models

Arxiv

13+阅读 · 2022年3月10日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

Pre-Trained Models: Past, Present and Future

Arxiv

19+阅读 · 2021年6月15日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

Arxiv

11+阅读 · 2018年1月11日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

多模态大语言模型下游调优中“保持自我”的重要性

隐身自主无人水下航行器技术如何变革水下作战并重塑海军竞争

AI智能体时代中的记忆：形式、功能与动态综述

《俄乌战争中的无人系统：新的战争方式与新兴趋势——来自前线的印象》报告

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Prompting Is Programming: A Query Language For Large Language Models

Arxiv

0+阅读 · 2022年12月12日

Increasing the Cost of Model Extraction with Calibrated Proof of Work

Arxiv

0+阅读 · 2022年12月12日

Exploring Discrete Diffusion Models for Image Captioning

Exploring Discrete Diffusion Models for Image Captioning

Arxiv

0+阅读 · 2022年12月9日

Variational Diffusion Models

Arxiv

0+阅读 · 2022年12月9日

Prompt Distribution Learning

Arxiv

14+阅读 · 2022年5月6日

Conditional Prompt Learning for Vision-Language Models

Conditional Prompt Learning for Vision-Language Models

Arxiv

13+阅读 · 2022年3月10日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

Pre-Trained Models: Past, Present and Future

Arxiv

19+阅读 · 2021年6月15日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

Arxiv

11+阅读 · 2018年1月11日

相关基金

Biot固结方程的有限元方法及快速算法研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于Wyner-Ziv分布式编码的无线视频通信端到端失真度估算

国家自然科学基金

0+阅读 · 2014年12月31日

考虑观测值时空相关性的InSAR三维形变估计方法

国家自然科学基金

0+阅读 · 2013年12月31日

羊八井观测站大气不透明度的测量

国家自然科学基金

0+阅读 · 2013年12月31日

高阶Schwarz导数与Teichmuller空间紧化

国家自然科学基金

0+阅读 · 2012年12月31日

非线性椭圆型偏微分方程的边界正则性

国家自然科学基金

0+阅读 · 2012年12月31日

若干数学物理问题的谱方法和配置方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

Arc在颞叶癫痫发生及认知功能损害中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

遥感分类中的尺度效应机理与多尺度遥感数据分类方法研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员