Output length is critical to dialogue summarization systems. The dialogue summary length is determined by multiple factors, including dialogue complexity, summary objective, and personal preferences. In this work, we approach dialogue summary length from three perspectives. First, we analyze the length differences between existing models' outputs and the corresponding human references and find that summarization models tend to produce more verbose summaries due to their pretraining objectives. Second, we identify salient features for summary length prediction by comparing different model settings. Third, we experiment with a length-aware summarizer and show notable improvement on existing models if summary length can be well incorporated. Analysis and experiments are conducted on popular DialogSum and SAMSum datasets to validate our findings.
翻译:对对话总结系统来说,产出长度是关键。对话摘要长度是由多种因素决定的,包括对话的复杂性、简要目标和个人偏好。在这项工作中,我们从三个角度看待对话摘要长度。首先,我们分析现有模型产出和相应的人类参考文献之间的长度差异,发现汇总模型由于其培训前的目标,往往产生更多的verbose摘要。第二,我们通过比较不同的模型设置,确定摘要长度预测的显著特征。第三,我们试验长度摘要,如果能够很好地纳入摘要长度,则显示现有模型的显著改进。对流行的 DialogSum和SAMSum数据集进行了分析和实验,以验证我们的调查结果。