Decouple非参数知识蒸馏用于端到端语音翻译 (Decouple Non-parametric Knowledge Distillation For End-to-end Speech Translation) - 专知论文

会员服务 ·

0

非参数 · 转录 · 蒸馏 · 知识蒸馏 · 语音翻译 ·

2023 年 4 月 20 日

Decouple Non-parametric Knowledge Distillation For End-to-end Speech Translation

翻译：Decouple非参数知识蒸馏用于端到端语音翻译

Hao Zhang,Nianwen Si,Yaqi Chen,Wenlin Zhang,Xukui Yang,Dan Qu,Zhen Li

from arxiv, Accepted by ICASSP 2023

Existing techniques often attempt to make knowledge transfer from a powerful machine translation (MT) to speech translation (ST) model with some elaborate techniques, which often requires transcription as extra input during training. However, transcriptions are not always available, and how to improve the ST model performance without transcription, i.e., data efficiency, has rarely been studied in the literature. In this paper, we propose Decoupled Non-parametric Knowledge Distillation (DNKD) from data perspective to improve the data efficiency. Our method follows the knowledge distillation paradigm. However, instead of obtaining the teacher distribution from a sophisticated MT model, we construct it from a non-parametric datastore via k-Nearest-Neighbor (kNN) retrieval, which removes the dependence on transcription and MT model. Then we decouple the classic knowledge distillation loss into target and non-target distillation to enhance the effect of the knowledge among non-target logits, which is the prominent "dark knowledge". Experiments on MuST-C corpus show that, the proposed method can achieve consistent improvement over the strong baseline without requiring any transcription.

翻译：现有技术通常尝试使用一些精细的技术将强大的机器翻译（MT）模型的知识转移至语音翻译（ST）模型，这往往需要在训练期间提供转录作为额外输入。然而，并非总是可用转录，并且如何在没有转录的情况下提高ST模型性能即数据效率很少在文献中研究。本文提出一种从数据角度出发的非参数解耦知识蒸馏（DNKD）以提高数据效率。我们的方法遵循知识蒸馏范例。但是，我们没有从复杂的MT模型中获取教师分布，而是通过k-最近邻（kNN）检索从非参数数据存储库构建它，从而去除了对转录和MT模型的依赖性。然后，我们将经典的知识蒸馏损失解耦为目标和非目标蒸馏，以增强非目标logit之间的知识效果，这是显著的“暗知识”。在MuST-C语料库上的实验表明，所提出的方法可以在不需要任何转录的情况下实现相对较为一致的强基线改进。

0

相关内容

非参数

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

专知会员服务

28+阅读 · 2022年3月3日

NeurIPS 2021 | 寻MixTraining: 一种全新的物体检测训练范式

NeurIPS 2021 | 寻MixTraining: 一种全新的物体检测训练范式

专知会员服务

12+阅读 · 2021年12月9日

【ACL2021】Weight Distillation：神经网络权重知识迁移方法

专知会员服务

21+阅读 · 2021年8月17日

【NeurIPS 2020】广义神经网络中的知识蒸馏: 风险约束、数据效率和不完善的教师

【NeurIPS 2020】广义神经网络中的知识蒸馏: 风险约束、数据效率和不完善的教师

专知会员服务

18+阅读 · 2020年11月11日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【SIGIR2020】学习词项区分性，Learning Term Discrimination

【SIGIR2020】学习词项区分性，Learning Term Discrimination

专知会员服务

16+阅读 · 2020年4月28日

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

专知会员服务

11+阅读 · 2019年12月28日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

NAACL 2022 | 机器翻译SOTA模型的蒸馏

NAACL 2022 | 机器翻译SOTA模型的蒸馏

PaperWeekly

1+阅读 · 2022年6月28日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

深度学习自然语言处理

18+阅读 · 2020年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

22篇论文！增量学习/终生学习论文资源列表

22篇论文！增量学习/终生学习论文资源列表

专知

32+阅读 · 2018年12月27日

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

专知

15+阅读 · 2018年5月28日

【论文推荐】最新六篇命名实体识别相关论文—跨专业医学、阿拉伯命名实体、中国临床、深度多任务学习、多模态、图卷积网络

【论文推荐】最新六篇命名实体识别相关论文—跨专业医学、阿拉伯命名实体、中国临床、深度多任务学习、多模态、图卷积网络

专知

54+阅读 · 2018年5月21日

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

专知

17+阅读 · 2018年2月11日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

多变化环境监测系统的系统诊断结构与高效诊断算法分析与研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于格值逻辑的语言真值α-群锁语义归结自动推理研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于非独立同分布样本的统计学习理论研究与应用

国家自然科学基金

0+阅读 · 2014年12月31日

郭守敬望远镜目标分配优化及在线测试系统研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于非线性语音谱分析的单通道语音增强研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于多任务学习的应用商店客户识别模型研究

国家自然科学基金

0+阅读 · 2013年12月31日

框架的冗余度

国家自然科学基金

0+阅读 · 2012年12月31日

医学图像的高容量及鲁棒可逆水印的研究

国家自然科学基金

1+阅读 · 2012年12月31日

通用可复合安全的密码协议及其应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于认知机理和语义信息的交互式人脸检索

国家自然科学基金

1+阅读 · 2011年12月31日

oBERTa: Improving Sparse Transfer Learning via improved initialization, distillation, and pruning regimes

oBERTa: Improving Sparse Transfer Learning via improved initialization, distillation, and pruning regimes

Arxiv

0+阅读 · 2023年6月6日

Phase perturbation improves channel robustness for speech spoofing countermeasures

Arxiv

0+阅读 · 2023年6月6日

On the Behavior of Intrusive and Non-intrusive Speech Enhancement Metrics in Predictive and Generative Settings

Arxiv

0+阅读 · 2023年6月5日

Linkless Link Prediction via Relational Distillation

Arxiv

0+阅读 · 2023年6月5日

Learning nonparametric latent causal graphs with unknown interventions

Arxiv

0+阅读 · 2023年6月5日

Text Style Transfer Back-Translation

Arxiv

0+阅读 · 2023年6月2日

Federated Learning Meets Natural Language Processing: A Survey

Arxiv

19+阅读 · 2021年7月27日

TinyBERT: Distilling BERT for Natural Language Understanding

TinyBERT: Distilling BERT for Natural Language Understanding

Arxiv

11+阅读 · 2019年9月23日

Which Knowledge Graph Is Best for Me?

Arxiv

11+阅读 · 2018年9月28日

Zero-Shot Transfer Learning for Event Extraction

Arxiv

10+阅读 · 2017年7月4日

VIP会员

文章信息

相关主题

相关VIP内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

专知会员服务

28+阅读 · 2022年3月3日

NeurIPS 2021 | 寻MixTraining: 一种全新的物体检测训练范式

NeurIPS 2021 | 寻MixTraining: 一种全新的物体检测训练范式

专知会员服务

12+阅读 · 2021年12月9日

【ACL2021】Weight Distillation：神经网络权重知识迁移方法

专知会员服务

21+阅读 · 2021年8月17日

【NeurIPS 2020】广义神经网络中的知识蒸馏: 风险约束、数据效率和不完善的教师

【NeurIPS 2020】广义神经网络中的知识蒸馏: 风险约束、数据效率和不完善的教师

专知会员服务

18+阅读 · 2020年11月11日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【SIGIR2020】学习词项区分性，Learning Term Discrimination

【SIGIR2020】学习词项区分性，Learning Term Discrimination

专知会员服务

16+阅读 · 2020年4月28日

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

专知会员服务

11+阅读 · 2019年12月28日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《美空军条令出版物：战略打击》最新条令

《高能激光武器》22页slides

军事前沿模型

《面向小型无人机或无人飞行器的创新雷达探测与人工智能分类技术》263页

相关资讯

NAACL 2022 | 机器翻译SOTA模型的蒸馏

NAACL 2022 | 机器翻译SOTA模型的蒸馏

PaperWeekly

1+阅读 · 2022年6月28日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

深度学习自然语言处理

18+阅读 · 2020年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

22篇论文！增量学习/终生学习论文资源列表

22篇论文！增量学习/终生学习论文资源列表

专知

32+阅读 · 2018年12月27日

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

专知

15+阅读 · 2018年5月28日

【论文推荐】最新六篇命名实体识别相关论文—跨专业医学、阿拉伯命名实体、中国临床、深度多任务学习、多模态、图卷积网络

【论文推荐】最新六篇命名实体识别相关论文—跨专业医学、阿拉伯命名实体、中国临床、深度多任务学习、多模态、图卷积网络

专知

54+阅读 · 2018年5月21日

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

专知

17+阅读 · 2018年2月11日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

相关论文

oBERTa: Improving Sparse Transfer Learning via improved initialization, distillation, and pruning regimes

oBERTa: Improving Sparse Transfer Learning via improved initialization, distillation, and pruning regimes

Arxiv

0+阅读 · 2023年6月6日

Phase perturbation improves channel robustness for speech spoofing countermeasures

Arxiv

0+阅读 · 2023年6月6日

On the Behavior of Intrusive and Non-intrusive Speech Enhancement Metrics in Predictive and Generative Settings

Arxiv

0+阅读 · 2023年6月5日

Linkless Link Prediction via Relational Distillation

Arxiv

0+阅读 · 2023年6月5日

Learning nonparametric latent causal graphs with unknown interventions

Arxiv

0+阅读 · 2023年6月5日

Text Style Transfer Back-Translation

Arxiv

0+阅读 · 2023年6月2日

Federated Learning Meets Natural Language Processing: A Survey

Arxiv

19+阅读 · 2021年7月27日

TinyBERT: Distilling BERT for Natural Language Understanding

TinyBERT: Distilling BERT for Natural Language Understanding

Arxiv

11+阅读 · 2019年9月23日

Which Knowledge Graph Is Best for Me?

Arxiv

11+阅读 · 2018年9月28日

Zero-Shot Transfer Learning for Event Extraction

Arxiv

10+阅读 · 2017年7月4日

相关基金

多变化环境监测系统的系统诊断结构与高效诊断算法分析与研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于格值逻辑的语言真值α-群锁语义归结自动推理研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于非独立同分布样本的统计学习理论研究与应用

国家自然科学基金

0+阅读 · 2014年12月31日

郭守敬望远镜目标分配优化及在线测试系统研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于非线性语音谱分析的单通道语音增强研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于多任务学习的应用商店客户识别模型研究

国家自然科学基金

0+阅读 · 2013年12月31日

框架的冗余度

国家自然科学基金

0+阅读 · 2012年12月31日

医学图像的高容量及鲁棒可逆水印的研究

国家自然科学基金

1+阅读 · 2012年12月31日

通用可复合安全的密码协议及其应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于认知机理和语义信息的交互式人脸检索

国家自然科学基金

1+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员