An advantage of seq2seq abstractive summarization models is that they generate text in a free-form manner, but this flexibility makes it difficult to interpret model behavior. In this work, we analyze summarization decoders in both blackbox and whitebox ways by studying on the entropy, or uncertainty, of the model's token-level predictions. For two strong pre-trained models, PEGASUS and BART on two summarization datasets, we find a strong correlation between low prediction entropy and where the model copies tokens rather than generating novel text. The decoder's uncertainty also connects to factors like sentence position and syntactic distance between adjacent pairs of tokens, giving a sense of what factors make a context particularly selective for the model's next output token. Finally, we study the relationship of decoder uncertainty and attention behavior to understand how attention gives rise to these observed effects in the model. We show that uncertainty is a useful perspective for analyzing summarization and text generation models more broadly.
翻译:在这项工作中,我们通过研究模型象征性水平预测的酶性或不确定性,以黑盒和白盒两种方式分析总和解码器。对于两个经过预先训练的强力模型,即PEGASUS和BART在两个总和数据集上的强势模型,我们发现低预测酶和模型复制符号而不是生成新文本之间的紧密关联。解码器的不确定性还连接到诸如句子位置和相邻象征物组合距离等因素,从而感知哪些因素使得模型下一个输出符号具有特别选择性。最后,我们研究了解码器不确定性和注意行为之间的关系,以了解注意如何引起模型中观察到的这些效应。我们表明,不确定性是分析总和文本生成模型的有用视角。