In Natural Language Processing, multi-document summarization (MDS) poses many challenges to researchers above those posed by single-document summarization (SDS). These challenges include the increased search space and greater potential for the inclusion of redundant information. While advancements in deep learning approaches have led to the development of several advanced language models capable of summarization, the variety of training data specific to the problem of MDS remains relatively limited. Therefore, MDS approaches which require little to no pretraining, known as few-shot or zero-shot applications, respectively, could be beneficial additions to the current set of tools available in summarization. To explore one possible approach, we devise a strategy for combining state-of-the-art models' outputs using maximal marginal relevance (MMR) with a focus on query relevance rather than document diversity. Our MMR-based approach shows improvement over some aspects of the current state-of-the-art results in both few-shot and zero-shot MDS applications while maintaining a state-of-the-art standard of output by all available metrics.
翻译:在自然语言处理中,多文件汇总(MDS)对研究人员构成许多挑战,超过单一文件汇总(SDS)造成的挑战。这些挑战包括搜索空间的增加和纳入多余信息的可能性的扩大。虽然深层学习方法的进展导致开发了几个能够总结的先进语言模型,但针对MDS问题的各种培训数据仍然相对有限。因此,MDS方法几乎不需要任何预培训,即所谓的微小或零点应用,可以有益地补充现有的一套汇总工具。为了探索一种可能的办法,我们制定战略,利用最尖端的边际相关性(MMMMR)将最新模型的产出结合起来,重点是查询相关性,而不是文件多样性。我们基于MMR方法显示,目前最新成果的某些方面在微小和零点MDS应用中都有改进,同时保持所有可用指标的最新产出标准。