This work offers a novel theoretical perspective on why, despite numerous attempts, adversarial approaches to generative modeling (e.g., GANs) have not been as popular for certain generation tasks, particularly sequential tasks such as Natural Language Generation, as they have in others, such as Computer Vision. In particular, on sequential data such as text, maximum-likelihood approaches are significantly more utilized than GANs. We show that, while it may seem that maximizing likelihood is inherently different than minimizing distinguishability, this distinction is largely artificial and only holds for limited models. We argue that minimizing KL-divergence (i.e., maximizing likelihood) is a more efficient approach to effectively minimizing the same distinguishability criteria that adversarial models seek to optimize. Reductions show that minimizing distinguishability can be seen as simply boosting likelihood for certain families of models including n-gram models and neural networks with a softmax output layer. To achieve a full polynomial-time reduction, a novel next-token distinguishability model is considered.
翻译:这项工作提供了一个新的理论观点,说明尽管作了无数尝试,但对于某些代代任务,特别是诸如自然语言生成等相继任务,例如计算机愿景等其他相继任务而言,基因模型的对抗性方法(例如,GANs)为何没有象计算机愿景等其他代代代任务那样受到欢迎,特别是对于诸如文本等相继数据而言,最大相似性方法比GANs得到的利用要多得多。我们表明,虽然最大可能性与尽可能缩小区别在本质上可能不同,但这种区别基本上是人为的,只保留有限的模式。我们争辩说,尽量减少KL-diverence(即,最大可能性)是一种更有效的方法,可以有效地尽量减少对抗模式力求优化的相同区别标准。减少差异可以表明,可以将最小化视为只是提高某些模型(包括n-gram模型和带有软式马克斯输出层的神经网络)的可能性。为了实现完全的多时缩减,我们考虑一种新的下一端区分模式。