DDC-MBR: 最低贝ysian风险解码的分布式冷却 (DC-MBR: Distributional Cooling for Minimum Bayesian Risk Decoding) - 专知论文

会员服务 ·

0

平滑 · 极小点 · 标注 · 解码 · Performer ·

2022 年 12 月 8 日

DC-MBR: Distributional Cooling for Minimum Bayesian Risk Decoding

翻译：DDC-MBR: 最低贝ysian风险解码的分布式冷却

Jianhao Yan,Jin Xu,Fandong Meng,Jie Zhou,Yue Zhang

from arxiv, Work in Progress

Minimum Bayesian Risk Decoding (MBR) emerges as a promising decoding algorithm in Neural Machine Translation. However, MBR performs poorly with label smoothing, which is surprising as label smoothing provides decent improvement with beam search and improves generality in various tasks. In this work, we show that the issue arises from the un-consistency of label smoothing on the token-level and sequence-level distributions. We demonstrate that even though label smoothing only causes a slight change in the token-level, the sequence-level distribution is highly skewed. We coin the issue \emph{distributional over-smoothness}. To address this issue, we propose a simple and effective method, Distributional Cooling MBR (DC-MBR), which manipulates the entropy of output distributions by tuning down the Softmax temperature. We theoretically prove the equivalence between pre-tuning label smoothing factor and distributional cooling. Experiments on NMT benchmarks validate that distributional cooling improves MBR's efficiency and effectiveness in various settings.

翻译：在神经机器翻译中,最低巴伊斯风险解码算法(MBR)作为一种有希望的解码算法出现。然而,MBR在标签平滑方面表现不佳,这令人惊讶,因为标签平滑通过光束搜索提供了体面的改进,提高了各种任务的一般性。在这项工作中,我们显示,问题出自于象征级别和顺序级别分布的标签平滑不一致。我们证明,即使标签平滑只能造成象征性水平的微小变化,但序列级别分布高度偏斜。我们发现了问题。为了解决这个问题,我们提出了一个简单有效的方法,即分布式冷却 MBR(DC-MBR),它通过调低软体温度来操纵输出分布的酶。我们从理论上证明预调标签平滑系数和分布冷却之间的等值。关于NMT基准的实验证实,分配式冷却提高了不同环境的MSR的效率和效力。

0

相关内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【开放书】卡耐基梅隆大学Elaine Shi 教授《Foundations of Distributed Consensus and Blockchains（分布式共识和区块链的基础）》150页pdf

【开放书】卡耐基梅隆大学Elaine Shi 教授《Foundations of Distributed Consensus and Blockchains（分布式共识和区块链的基础）》150页pdf

专知会员服务

30+阅读 · 2022年2月22日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

深度强化学习实验室

1+阅读 · 2022年1月11日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

哺乳动物中SR蛋白激酶SRPK1通过调控Drosha的磷酸化参与microRNA生物合成的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

hsa-miR-129-5p负调节Warburg效应和抑制肝癌生长转移的分子机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

新的小分子化合物WJ460通过靶向Myoferlin抑制乳腺癌转移和复发的分子机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

IFN-γ通过EZH2介导lncRNA调控肝癌中枯否细胞表达Galectin-9的机制

国家自然科学基金

0+阅读 · 2013年12月31日

动态空间信息网络的传输容量与关键技术

国家自然科学基金

0+阅读 · 2013年12月31日

茉莉酸与生长素互作调控根干细胞维持的分子机理

国家自然科学基金

0+阅读 · 2013年12月31日

面向人与Agent混合的多团队协作仿真训练方法研究

国家自然科学基金

19+阅读 · 2012年12月31日

致癌物NNKⅠ,Ⅱ相代谢酶CYP2A13,UGT2B17和Ⅲ相ABC转运体基因多态与肺癌的协同关联及FOXA2介导的共调控机制

国家自然科学基金

0+阅读 · 2012年12月31日

基于Steger-Warming FVS 的长管道气液两相瞬变流计算及其水锤的气阀防护研究

国家自然科学基金

0+阅读 · 2012年12月31日

榕树及其传粉榕小蜂互惠共存的一种新机制

国家自然科学基金

0+阅读 · 2009年12月31日

A Practical Mixed Precision Algorithm for Post-Training Quantization

Arxiv

0+阅读 · 2023年2月10日

Assessing the validity of Bayesian inference using loss functions

Arxiv

0+阅读 · 2023年2月9日

Joint Jammer Mitigation and Data Detection for Smart, Distributed, and Multi-Antenna Jammers

Arxiv

0+阅读 · 2023年2月9日

Federated Learning as Variational Inference: A Scalable Expectation Propagation Approach

Arxiv

0+阅读 · 2023年2月8日

Taming Client Dropout and Improving Efficiency for Distributed Differential Privacy in Federated Learning

Arxiv

0+阅读 · 2023年2月8日

Syntax and Domain Aware Model for Unsupervised Program Translation

Arxiv

0+阅读 · 2023年2月8日

Prompt Distribution Learning

Arxiv

14+阅读 · 2022年5月6日

Active Learning for Domain Adaptation: An Energy-based Approach

Arxiv

13+阅读 · 2021年12月2日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Linguistically-Informed Self-Attention for Semantic Role Labeling

Arxiv

17+阅读 · 2018年8月28日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【开放书】卡耐基梅隆大学Elaine Shi 教授《Foundations of Distributed Consensus and Blockchains（分布式共识和区块链的基础）》150页pdf

【开放书】卡耐基梅隆大学Elaine Shi 教授《Foundations of Distributed Consensus and Blockchains（分布式共识和区块链的基础）》150页pdf

专知会员服务

30+阅读 · 2022年2月22日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

发射器定位中的传感器路径规划研究 | 235页

战略无人机 | 2025最新80页

蜂窝通信是否是无人机与无人地面战车主宰战场的关键？

无人机对机动战的影响 | 2025最新文献

相关资讯

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

深度强化学习实验室

1+阅读 · 2022年1月11日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

A Practical Mixed Precision Algorithm for Post-Training Quantization

Arxiv

0+阅读 · 2023年2月10日

Assessing the validity of Bayesian inference using loss functions

Arxiv

0+阅读 · 2023年2月9日

Joint Jammer Mitigation and Data Detection for Smart, Distributed, and Multi-Antenna Jammers

Arxiv

0+阅读 · 2023年2月9日

Federated Learning as Variational Inference: A Scalable Expectation Propagation Approach

Arxiv

0+阅读 · 2023年2月8日

Taming Client Dropout and Improving Efficiency for Distributed Differential Privacy in Federated Learning

Arxiv

0+阅读 · 2023年2月8日

Syntax and Domain Aware Model for Unsupervised Program Translation

Arxiv

0+阅读 · 2023年2月8日

Prompt Distribution Learning

Arxiv

14+阅读 · 2022年5月6日

Active Learning for Domain Adaptation: An Energy-based Approach

Arxiv

13+阅读 · 2021年12月2日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Linguistically-Informed Self-Attention for Semantic Role Labeling

Arxiv

17+阅读 · 2018年8月28日

相关基金

哺乳动物中SR蛋白激酶SRPK1通过调控Drosha的磷酸化参与microRNA生物合成的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

hsa-miR-129-5p负调节Warburg效应和抑制肝癌生长转移的分子机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

新的小分子化合物WJ460通过靶向Myoferlin抑制乳腺癌转移和复发的分子机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

IFN-γ通过EZH2介导lncRNA调控肝癌中枯否细胞表达Galectin-9的机制

国家自然科学基金

0+阅读 · 2013年12月31日

动态空间信息网络的传输容量与关键技术

国家自然科学基金

0+阅读 · 2013年12月31日

茉莉酸与生长素互作调控根干细胞维持的分子机理

国家自然科学基金

0+阅读 · 2013年12月31日

面向人与Agent混合的多团队协作仿真训练方法研究

国家自然科学基金

19+阅读 · 2012年12月31日

致癌物NNKⅠ,Ⅱ相代谢酶CYP2A13,UGT2B17和Ⅲ相ABC转运体基因多态与肺癌的协同关联及FOXA2介导的共调控机制

国家自然科学基金

0+阅读 · 2012年12月31日

基于Steger-Warming FVS 的长管道气液两相瞬变流计算及其水锤的气阀防护研究

国家自然科学基金

0+阅读 · 2012年12月31日

榕树及其传粉榕小蜂互惠共存的一种新机制

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员