Algorithms for text-generation in dialogue can be misguided. For example, in task-oriented settings, reinforcement learning that optimizes only task-success can lead to abysmal lexical diversity. We hypothesize this is due to poor theoretical understanding of the objectives in text-generation and their relation to the learning process (i.e., model training). To this end, we propose a new theoretical framework for learning to generate text in dialogue. Compared to existing theories of learning, our framework allows for analysis of the multi-faceted goals inherent to text-generation. We use our framework to develop theoretical guarantees for learners that adapt to unseen data. As an example, we apply our theory to study data-shift within a cooperative learning algorithm proposed for the GuessWhat?! visual dialogue game. From this insight, we propose a new algorithm, and empirically, we demonstrate our proposal improves both task-success and human-likeness of the generated text. Finally, we show statistics from our theory are empirically predictive of multiple qualities of the generated dialogue, suggesting our theory is useful for model-selection when human evaluations are not available.
翻译:例如,在任务导向的环境下,强化只优化任务成功程度的学习,可以导致超乎寻常的字典多样性。我们假设,这是因为对文本生成的目标及其与学习过程的关系缺乏理论上的理解(例如,模式培训)。为此,我们提出了一个新的学习在对话中生成文本的理论框架。与现有的学习理论相比,我们的框架允许分析文本生成所固有的多面目标。我们利用我们的框架为适应未知数据的学习者制定理论保障。举例来说,我们应用我们的理论在为“猜想是什么”提议的合作学习算法中研究数据转换。视觉对话游戏。我们从这个角度提出新的算法,从经验上,我们展示我们的提案既改进了任务成功,又改进了生成文本的人类相似性。最后,我们从我们理论中得出的统计数据是对生成的对话的多重质量的实验性预测,表明我们的理论在人类评估没有可用时对模型选择有用。