In comparison to single-document summarization, abstractive Multi-Document Summarization (MDS) brings challenges on the representation and coverage of its lengthy and linked sources. This study develops a Parallel Hierarchical Transformer (PHT) with attention alignment for MDS. By incorporating word- and paragraph-level multi-head attentions, the hierarchical architecture of PHT allows better processing of dependencies at both token and document levels. To guide the decoding towards a better coverage of the source documents, the attention-alignment mechanism is then introduced to calibrate beam search with predicted optimal attention distributions. Based on the WikiSum data, a comprehensive evaluation is conducted to test improvements on MDS by the proposed architecture. By better handling the inner- and cross-document information, results in both ROUGE and human evaluation suggest that our hierarchical model generates summaries of higher quality relative to other Transformer-based baselines at relatively low computational cost.
翻译:与单一文件摘要相比,抽象的多文件摘要化(MDS)对其长期和关联来源的表述和覆盖面提出了挑战。本研究开发了平行的等级变异器(PHT),对MDS给予了关注。通过纳入字和段落的多头关注,PHT的等级结构可以更好地处理象征性和文件层面的相互依存关系。为指导解码以更好地覆盖源文件,随后引入了关注对称机制,以预测的最佳关注分布校准波束搜索。根据WikiSum数据,进行了全面评价,以测试拟议结构对MDS的改进。通过更好地处理内和跨文档信息,ROUGE和人类评价的结果显示,我们的等级模型生成了相对于其他基于变异器的基线的质量较低的汇总。