在全变化距离下定制语言生成模型</s> (Tailoring Language Generation Models under Total Variation Distance)

The standard paradigm of neural language generation adopts maximum likelihood estimation (MLE) as the optimizing method. From a distributional view, MLE in fact minimizes the Kullback-Leibler divergence (KLD) between the distribution of the real data and that of the model. However, this approach forces the model to distribute non-zero (sometimes large) probability mass to all training samples regardless of their quality. Moreover, in the attempt to cover the low-probability regions in the data distribution, the model systematically overestimates the probability of corrupted text sequences, which we conjecture is one of the main reasons for text degeneration during autoregressive decoding. To remedy this problem, we leverage the total variation distance (TVD) with its robustness to outliers, and develop practical bounds to apply it to language generation. Then, we introduce the TaiLr objective that balances the tradeoff of estimating TVD. Intuitively, TaiLr downweights real data samples that have low model probabilities with tunable penalization intensity. Experimental results show that our method alleviates the overestimation of degenerated sequences without sacrificing diversity and improves generation quality on a wide range of text generation tasks.

翻译：神经语言生成的标准模式将最大概率估计(MLE)作为最优化的方法。从分布的角度来看,MLE实际上将真实数据分布和模型分布之间的 Kullback-Leiper 差异(KLD) 最小化了实际数据分布与模型分布之间的 KURLD 差异(KLD) 。然而,这种方法迫使模型将非零(有时大)概率质量(非零(有时大)质量)质量分配给所有培训样本,而不论其质量质量如何。此外,为了在数据分布中覆盖低概率区域,模型系统地高估了腐败文本序列的概率(MLME ) 。从分布的角度看,MLELE实际上将实际文本序列的概率偏差(MLE)作为优化方法。我们推测,这是在自动递增递增的解调调调调调调调调时,文本序列是造成文本在自动递归回的解解解解解调调调调调调调调时,造成文本变出的主要原因之一。为了解决这个问题,我们利用总变差距离(TVD)来利用总变差距离(TVD)和制定实际界限,将其应用于语言的平衡应用。然后,不牺牲多样化。然后我们提出TTTTVD(TVD)的目标平衡。我们提出,不改进了TVD多样化,不改进了TVD的多样性和实验结果。我们提出,我们提出实际数据样本的真正数据样本,不改进了数据样本。我们的方法减轻了数据样本,不改进了方法,不改进了高,不改进了高的,不改进了生产制制制制制制制制制制制制制制制的,不改进了高,不改进了生产方法,不改进了生产质量,不改进了生产质量。</s>

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日