压缩多语言神经机器翻译模型的知识蒸馏的实证研究 (An Empirical Study of Leveraging Knowledge Distillation for Compressing Multilingual Neural Machine Translation Models) - 专知论文

会员服务 ·

0

多语言神经机器翻译 · 实证研究 · 蒸馏 · 知识蒸馏 · 神经机器翻译 ·

2023 年 4 月 19 日

An Empirical Study of Leveraging Knowledge Distillation for Compressing Multilingual Neural Machine Translation Models

翻译：压缩多语言神经机器翻译模型的知识蒸馏的实证研究

Varun Gumma,Raj Dabre,Pratyush Kumar

from arxiv, accepted at EAMT 2023

Knowledge distillation (KD) is a well-known method for compressing neural models. However, works focusing on distilling knowledge from large multilingual neural machine translation (MNMT) models into smaller ones are practically nonexistent, despite the popularity and superiority of MNMT. This paper bridges this gap by presenting an empirical investigation of knowledge distillation for compressing MNMT models. We take Indic to English translation as a case study and demonstrate that commonly used language-agnostic and language-aware KD approaches yield models that are 4-5x smaller but also suffer from performance drops of up to 3.5 BLEU. To mitigate this, we then experiment with design considerations such as shallower versus deeper models, heavy parameter sharing, multi-stage training, and adapters. We observe that deeper compact models tend to be as good as shallower non-compact ones, and that fine-tuning a distilled model on a High-Quality subset slightly boosts translation quality. Overall, we conclude that compressing MNMT models via KD is challenging, indicating immense scope for further research.

翻译：知识蒸馏（KD）是一种压缩神经模型的常用方法。然而，尽管多语言神经机器翻译（MNMT）的流行和优越性，但关于将来自大型MNMT模型的知识蒸馏到较小模型中的研究实际上不存在。本文通过提供一项实证研究，弥合了这一差距。我们以Indic翻译英语为例，并展示了通常使用的语言不可知和语言感知的知识蒸馏方法产生的模型，尽管体积缩小了4-5倍，但性能下降了最高3.5 BLEU。为了缓解这一问题，我们实验了浅层与深层模型、重参数共享、多阶段训练和适配器等设计考虑因素。我们观察到，深度紧凑模型往往与浅层而非紧凑模型一样好，并且在高质量子集上对蒸馏模型进行微调会略微提高翻译质量。总之，我们得出结论：通过KD压缩MNMT模型具有挑战性，表明进一步研究的范围广泛。

0

相关内容

多语言神经机器翻译

多语言神经机器翻译

多语言机器翻译使用一个翻译模型来处理多种语言。

【MIT-ICLR2022】在机器学习模型中注入公平性, Injecting fairness into machine-learning models

【MIT-ICLR2022】在机器学习模型中注入公平性, Injecting fairness into machine-learning models

专知会员服务

22+阅读 · 2022年3月7日

【NeurIPS 2020】融入BERT到并行序列模型

【NeurIPS 2020】融入BERT到并行序列模型

专知会员服务

26+阅读 · 2020年10月15日

多语言神经机器翻译综述论文，34页pdf，A Comprehensive Survey of Multilingual Neural Machine Translation

多语言神经机器翻译综述论文，34页pdf，A Comprehensive Survey of Multilingual Neural Machine Translation

专知会员服务

19+阅读 · 2020年4月25日

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【论文】多语言神经机器翻译综述（A Comprehensive Survey of Multilingual Neural Machine Translation）

【论文】多语言神经机器翻译综述（A Comprehensive Survey of Multilingual Neural Machine Translation）

专知会员服务

20+阅读 · 2020年1月7日

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

专知会员服务

11+阅读 · 2019年12月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

COLING 2022 | Pro-KD：循序渐进的平滑知识蒸馏

COLING 2022 | Pro-KD：循序渐进的平滑知识蒸馏

PaperWeekly

1+阅读 · 2022年10月5日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

面向异分布数据的主动学习方法

国家自然科学基金

12+阅读 · 2015年12月31日

Resveratrol联合MSCs移植对阿尔茨海默鼠的干预效果及Sirt1分子信号的介导作用

国家自然科学基金

0+阅读 · 2014年12月31日

面向多类图像分类的众包主动学习方法研究

国家自然科学基金

2+阅读 · 2013年12月31日

基于RCA-量子点级联放大的超灵敏电化学压电生物传感器检测CTCs的关键技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于主干成分的句法统计机器翻译模型研究

国家自然科学基金

0+阅读 · 2013年12月31日

Eulerian bond-cubic 模型渗流性质的数值研究

国家自然科学基金

0+阅读 · 2012年12月31日

microRNA 和cis-nat-siRNA介导大麦细胞壁和β－D－葡聚糖合成调控的分子机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

升空平台通信中的动态多域抗干扰方法研究

国家自然科学基金

0+阅读 · 2010年12月31日

基于Web及知识获取的无指导汉语词义消歧技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

保护性耕作对寒地黑土细菌多样性的影响

国家自然科学基金

0+阅读 · 2009年12月31日

Orca: Progressive Learning from Complex Explanation Traces of GPT-4

Arxiv

0+阅读 · 2023年6月5日

Commonsense Knowledge Transfer for Pre-trained Language Models

Arxiv

0+阅读 · 2023年6月4日

Data-Efficient French Language Modeling with CamemBERTa

Arxiv

0+阅读 · 2023年6月2日

Leveraging Auxiliary Domain Parallel Data in Intermediate Task Fine-tuning for Low-resource Translation

Arxiv

0+阅读 · 2023年6月2日

Privacy Distillation: Reducing Re-identification Risk of Multimodal Diffusion Models

Arxiv

0+阅读 · 2023年6月2日

Active Code Learning: Benchmarking Sample-Efficient Training of Code Models

Arxiv

0+阅读 · 2023年6月2日

On the Off-Target Problem of Zero-Shot Multilingual Neural Machine Translation

Arxiv

0+阅读 · 2023年6月2日

Improving Polish to English Neural Machine Translation with Transfer Learning: Effects of Data Volume and Language Similarity

Arxiv

0+阅读 · 2023年6月1日

Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning

Arxiv

11+阅读 · 2023年3月10日

From Dense to Sparse: Contrastive Pruning for Better Pre-trained Language Model Compression

Arxiv

10+阅读 · 2021年12月14日

VIP会员

文章信息

相关主题

多语言神经机器翻译

神经机器翻译

相关VIP内容

【MIT-ICLR2022】在机器学习模型中注入公平性, Injecting fairness into machine-learning models

【MIT-ICLR2022】在机器学习模型中注入公平性, Injecting fairness into machine-learning models

专知会员服务

22+阅读 · 2022年3月7日

【NeurIPS 2020】融入BERT到并行序列模型

【NeurIPS 2020】融入BERT到并行序列模型

专知会员服务

26+阅读 · 2020年10月15日

多语言神经机器翻译综述论文，34页pdf，A Comprehensive Survey of Multilingual Neural Machine Translation

多语言神经机器翻译综述论文，34页pdf，A Comprehensive Survey of Multilingual Neural Machine Translation

专知会员服务

19+阅读 · 2020年4月25日

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【论文】多语言神经机器翻译综述（A Comprehensive Survey of Multilingual Neural Machine Translation）

【论文】多语言神经机器翻译综述（A Comprehensive Survey of Multilingual Neural Machine Translation）

专知会员服务

20+阅读 · 2020年1月7日

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

专知会员服务

11+阅读 · 2019年12月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

COLING 2022 | Pro-KD：循序渐进的平滑知识蒸馏

COLING 2022 | Pro-KD：循序渐进的平滑知识蒸馏

PaperWeekly

1+阅读 · 2022年10月5日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

相关论文

Orca: Progressive Learning from Complex Explanation Traces of GPT-4

Arxiv

0+阅读 · 2023年6月5日

Commonsense Knowledge Transfer for Pre-trained Language Models

Arxiv

0+阅读 · 2023年6月4日

Data-Efficient French Language Modeling with CamemBERTa

Arxiv

0+阅读 · 2023年6月2日

Leveraging Auxiliary Domain Parallel Data in Intermediate Task Fine-tuning for Low-resource Translation

Arxiv

0+阅读 · 2023年6月2日

Privacy Distillation: Reducing Re-identification Risk of Multimodal Diffusion Models

Arxiv

0+阅读 · 2023年6月2日

Active Code Learning: Benchmarking Sample-Efficient Training of Code Models

Arxiv

0+阅读 · 2023年6月2日

On the Off-Target Problem of Zero-Shot Multilingual Neural Machine Translation

Arxiv

0+阅读 · 2023年6月2日

Improving Polish to English Neural Machine Translation with Transfer Learning: Effects of Data Volume and Language Similarity

Arxiv

0+阅读 · 2023年6月1日

Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning

Arxiv

11+阅读 · 2023年3月10日

From Dense to Sparse: Contrastive Pruning for Better Pre-trained Language Model Compression

Arxiv

10+阅读 · 2021年12月14日

相关基金

面向异分布数据的主动学习方法

国家自然科学基金

12+阅读 · 2015年12月31日

Resveratrol联合MSCs移植对阿尔茨海默鼠的干预效果及Sirt1分子信号的介导作用

国家自然科学基金

0+阅读 · 2014年12月31日

面向多类图像分类的众包主动学习方法研究

国家自然科学基金

2+阅读 · 2013年12月31日

基于RCA-量子点级联放大的超灵敏电化学压电生物传感器检测CTCs的关键技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于主干成分的句法统计机器翻译模型研究

国家自然科学基金

0+阅读 · 2013年12月31日

Eulerian bond-cubic 模型渗流性质的数值研究

国家自然科学基金

0+阅读 · 2012年12月31日

microRNA 和cis-nat-siRNA介导大麦细胞壁和β－D－葡聚糖合成调控的分子机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

升空平台通信中的动态多域抗干扰方法研究

国家自然科学基金

0+阅读 · 2010年12月31日

基于Web及知识获取的无指导汉语词义消歧技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

保护性耕作对寒地黑土细菌多样性的影响

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员