Abstractive text summarization has recently become a popular approach, but data hallucination remains a serious problem, including with quantitative data. We propose a set of probing tests to evaluate the efficacy of abstract summarization models' modeling of quantitative values found in the input text. Our results show that in most cases, the encoders of recent SOTA-performing models struggle to provide embeddings that adequately represent quantitative values in the input compared to baselines, and in particular, they outperform random representations in some, but surprisingly not all, cases. Under our assumptions, this suggests that the encoder's performance contributes to the quantity hallucination problem. One model type in particular, DistilBART-CDM, was observed to underperform randomly initialized representations for several experiments, and performance versus BERT suggests that standard pretraining and fine-tuning approaches for the summarization task may play a role in underperformance for some encoders.
翻译:抽象文本摘要最近已成为一种流行的做法,但数据幻觉仍然是一个严重问题,包括数量数据。我们提出一套测试方法,以评价抽象摘要模型对输入文本中发现的数量值的模型的建模效果。我们的结果显示,在多数情况下,最近SOTA执行模型的编码者努力提供与基线相比在输入中充分代表数量值的嵌入,特别是,它们在某些情况中,但并非在所有情况中都比随机表示得好,令人惊讶。根据我们的假设,这意味着编码器的性能助长了数量幻觉问题。一个模型类型,特别是DistilBART-CDM,被观察到在若干实验中出现不完善的随机初始表示方式,而业绩与BERT相比表明,对总化任务的标准预培训和微调方法可能在某些编码者表现不佳方面起到作用。