Humans tend to follow the Uniform Information Density (UID) principle by distributing information evenly in utterances. We study if decoding algorithms implicitly follow this UID principle, and under what conditions adherence to UID might be desirable for dialogue generation. We generate responses using different decoding algorithms with GPT-2 on the Persona-Chat dataset and collect human judgments on their quality using Amazon Mechanical Turk. We find that (i) surprisingly, model-generated responses follow the UID principle to a greater extent than human responses, and (ii) decoding algorithms that promote UID do not generate higher-quality responses. Instead, when we control for surprisal, non-uniformity of information density correlates with the quality of responses with very low/high surprisal. Our findings indicate that encouraging non-uniform responses is a potential solution to the ``likelihood trap'' problem (quality degradation in very high-likelihood text). Our dataset containing multiple candidate responses per dialog history along with human-annotated quality ratings is available at https://huggingface.co/datasets/saranya132/dialog_uid_gpt2.
翻译:摘要:人类倾向于遵循均匀信息密度(UID)原则,通过在话语中均匀分布信息。我们研究解码算法是否会隐式遵循这个UID原则,以及在何种条件下遵循UID可能有利于对话生成。我们使用GPT-2在Persona-Chat数据集上使用不同的解码算法生成响应,并使用Amazon Mechanical Turk收集人类对其质量的判断。我们发现(i)令人惊讶的是,模型生成的响应比人类响应更符合UID原则,(ii)促进UID的解码算法不会生成更高质量的响应。相反,在控制惊讶值的情况下,信息密度的非均匀性与具有非常低/高惊讶值的响应的质量相关。我们的研究结果表明,鼓励非均匀性的响应是“可能性陷阱”问题(即非常高的可能性文本中的质量降低)的潜在解决方案。我们的数据集包含每个对话历史记录的多个候选响应,以及人类注释的质量评分,可在https://huggingface.co/datasets/saranya132/dialog_uid_gpt2获取。