We develop an abstractive summarization framework independent of labeled data for multiple heterogeneous documents. Unlike existing multi-document summarization methods, our framework processes documents telling different stories instead of documents on the same topic. We also enhance an existing sentence fusion method with a uni-directional language model to prioritize fused sentences with higher sentence probability with the goal of increasing readability. Lastly, we construct a total of twelve dataset variations based on CNN/Daily Mail and the NewsRoom datasets, where each document group contains a large and diverse collection of documents to evaluate the performance of our model in comparison with other baseline systems. Our experiments demonstrate that our framework outperforms current state-of-the-art methods in this more generic setting.
翻译:我们开发了一个独立于多种不同文件标签数据的抽象总结框架。与现有的多文件总结方法不同,我们的框架处理文件记录讲述不同的故事,而不是关于同一主题的文件。我们还用单向语言模式加强现有的句子混合方法,以便优先安排刑罚概率较高的合并判决,从而增加可读性。最后,我们根据CNN/Daily Mail和NewsRoom数据集共构建了12个数据集变异,每个文件组都收集了大量不同的文件,用以与其他基线系统相比,评估模型的性能。我们的实验表明,我们的框架在这种更通用的环境下比目前最先进的方法要强。