Extractive models usually formulate text summarization as extracting fixed top-$k$ salient sentences from the document as a summary. Few works exploited extracting finer-grained Elementary Discourse Unit (EDU) with little analysis and justification for the extractive unit selection. Further, the selection strategy of the fixed top-$k$ salient sentences fits the summarization need poorly, as the number of salient sentences in different documents varies and therefore a common or best $k$ does not exist in reality. To fill these gaps, this paper first conducts the comparison analysis of oracle summaries based on EDUs and sentences, which provides evidence from both theoretical and experimental perspectives to justify and quantify that EDUs make summaries with higher automatic evaluation scores than sentences. Then, considering this merit of EDUs, this paper further proposes an EDU-level extractive model with Varying summary Lengths and develops the corresponding learning algorithm. EDU-VL learns to encode and predict probabilities of EDUs in the document, generate multiple candidate summaries with varying lengths based on various $k$ values, and encode and score candidate summaries, in an end-to-end training manner. Finally, EDU-VL is experimented on single and multi-document benchmark datasets and shows improved performances on ROUGE scores in comparison with state-of-the-art extractive models, and further human evaluation suggests that EDU-constituent summaries maintain good grammaticality and readability.
翻译:为填补这些空白,本文件首先根据“EDU”和“EDU”和“EDU”对“压倒”摘要进行比较分析,从理论和实验角度对“EDU”和“EDU”进行对比分析,以证明和量化“EDU”的自动评价分数高于“EDU”的分数。随后,考虑到“EDU”的这一优点,本文进一步提出“EDU”级的抽取模式,与“VDU”的缩略图有差异,并制定了相应的学习算法。EDU-VL学习如何在文件中对“EDU”的分数进行编码和预测,根据“EDU”和“EDU-L”的值值进行多重候选人摘要比较,在“EDU-DU”和“E-DU-DU-L”的分数中,在“EDU-DU-DU-L”最后和“E-DU-DU-L”级的分数表中,在“E-DU-DU-B-L”分数级的分数分析中,在“最后和“E-L-B-L”的分数级的分数表”中,在“B-S-S-L”中,在“EDU-B-L”中,以“结果”中,以“B-S-L”的分数表”中,在“最后和“结果”的分数分析中,以“B-级的分数表”中,在“B-级的分数表”中,以“B-级的评”中,在“B-L”中,在“B-L”中,在“B-L”中,在“B-L-L-L-L-L-L-L-L-L-L-S-L-L-L-S-L-L-L-L-L-L-L-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-</s>