质量估算知识蒸馏 (Knowledge Distillation for Quality Estimation) - 专知论文

会员服务 ·

0

估计/估计量 · 蒸馏 · MoDELS · Performer · Machine Translation ·

2021 年 7 月 1 日

Knowledge Distillation for Quality Estimation

翻译：质量估算知识蒸馏

Amit Gajbhiye,Marina Fomicheva,Fernando Alva-Manchego,Frédéric Blain,Abiola Obamuyide,Nikolaos Aletras,Lucia Specia

from arxiv, ACL Findings 2021

Quality Estimation (QE) is the task of automatically predicting Machine Translation quality in the absence of reference translations, making it applicable in real-time settings, such as translating online social media conversations. Recent success in QE stems from the use of multilingual pre-trained representations, where very large models lead to impressive results. However, the inference time, disk and memory requirements of such models do not allow for wide usage in the real world. Models trained on distilled pre-trained representations remain prohibitively large for many usage scenarios. We instead propose to directly transfer knowledge from a strong QE teacher model to a much smaller model with a different, shallower architecture. We show that this approach, in combination with data augmentation, leads to light-weight QE models that perform competitively with distilled pre-trained representations with 8x fewer parameters.

翻译：质量估计(QE)的任务是在没有参考翻译的情况下自动预测机器翻译质量,使之适用于实时环境,例如在线社交媒体对话翻译。QE最近的成功源于使用多语言的预培训演示,其中非常庞大的模型导致令人印象深刻的结果。然而,这些模型的推论时间、磁盘和记忆要求不允许在现实世界中广泛使用。对于许多使用情景而言,经过精炼的预培训演示模式培训的模型仍然过于庞大。我们提议直接将知识从一个强大的QE教师模型转移到一个规模小得多、结构不同、更浅的模型。我们表明,这一方法与数据扩增相结合,导致轻量的量化模型与经过精炼的预培训演示具有竞争力,其参数少8x。

0

相关内容

估计/估计量

估计/估计量

【ACL2021】Weight Distillation：神经网络权重知识迁移方法

专知会员服务

21+阅读 · 2021年8月17日

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

专知会员服务

96+阅读 · 2020年3月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

已删除

将门创投

11+阅读 · 2019年8月13日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Verdi: Quality Estimation and Error Detection for Bilingual Corpora

Arxiv

0+阅读 · 2021年9月3日

Topology Distillation for Recommender System

Arxiv

9+阅读 · 2021年6月16日

Progressive Network Grafting for Few-Shot Knowledge Distillation

Progressive Network Grafting for Few-Shot Knowledge Distillation

Arxiv

4+阅读 · 2020年12月9日

Knowledge Graphs

Arxiv

102+阅读 · 2020年3月4日

Contrastive Representation Distillation

Contrastive Representation Distillation

Arxiv

5+阅读 · 2019年10月23日

VIP会员

文章信息

相关主题

估计/估计量

Machine Translation

相关VIP内容

【ACL2021】Weight Distillation：神经网络权重知识迁移方法

专知会员服务

21+阅读 · 2021年8月17日

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

专知会员服务

96+阅读 · 2020年3月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

人机协同时代的军事指挥控制演进

《英国智库：瓦解俄罗斯防空系统生产，夺回制空权》最新报告

《通过仿真与开源数据提升战略决策：机遇与局限》最新报告

《战术突击工具包：军队的“边缘”操作系统》报告

相关资讯

已删除

将门创投

11+阅读 · 2019年8月13日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

相关论文

Verdi: Quality Estimation and Error Detection for Bilingual Corpora

Arxiv

0+阅读 · 2021年9月3日

Topology Distillation for Recommender System

Arxiv

9+阅读 · 2021年6月16日

Progressive Network Grafting for Few-Shot Knowledge Distillation

Progressive Network Grafting for Few-Shot Knowledge Distillation

Arxiv

4+阅读 · 2020年12月9日

Knowledge Graphs

Arxiv

102+阅读 · 2020年3月4日

Contrastive Representation Distillation

Contrastive Representation Distillation

Arxiv

5+阅读 · 2019年10月23日

微信扫码咨询专知VIP会员