As major progress is made in open-ended text generation, measuring how close machine-generated text is to human language remains a critical open problem. We introduce MAUVE, a comparison measure for open-ended text generation, which directly compares the learnt distribution from a text generation model to the distribution of human-written text using divergence frontiers. MAUVE scales up to modern text generation models by computing information divergences in a quantized embedding space. Through an extensive empirical study on three open-ended generation tasks, we find that MAUVE identifies known properties of generated text, scales naturally with model size, and correlates with human judgments, with fewer restrictions than existing distributional evaluation metrics.
翻译:由于在不限名额的文本生成方面取得重大进展,衡量机器生成的文本与人文的接近程度仍然是一个重要的未决问题。我们引入了不限名额的文本生成的比较措施MAUVE,这是对不限名额的文本生成的一种比较措施,它直接比较了从文本生成模式到使用差异边界分发人文文本的学习分配情况。MAUVE通过计算分级嵌入空间的信息差异,将现代文本生成模式推广到现代文本生成模式。通过对三种不限名额的生成任务进行广泛的实证研究,我们发现,MAUVE确定了所生成文本的已知属性,其尺度自然与模型大小有关,与人类判断相关,其限制比现有的分配评价指标要少。