In this paper, we describe our approach for the Podcast Summarisation challenge in TREC 2020. Given a podcast episode with its transcription, the goal is to generate a summary that captures the most important information in the content. Our approach consists of two steps: (1) Filtering redundant or less informative sentences in the transcription using the attention of a hierarchical model; (2) Applying a state-of-the-art text summarisation system (BART) fine-tuned on the Podcast data using a sequence-level reward function. Furthermore, we perform ensembles of three and nine models for our submission runs. We also fine-tune the BART model on the Podcast data as our baseline. The human evaluation by NIST shows that our best submission achieves 1.777 in the EGFB scale, while the score of creator-provided description is 1.291. Our system won the Spotify Podcast Summarisation Challenge in the TREC2020 Podcast Track in both human and automatic evaluation.
翻译:在本文中,我们描述了我们应对2020年播客总结挑战的方法。根据一个播客插图集及其抄录,我们的目标是生成一个摘要,捕捉内容中最重要的信息。我们的方法包括两个步骤:(1) 利用一个等级模式的注意,在抄录中过滤冗余或信息较少的句子;(2) 运用一个最先进的文本总结系统,使用一个序列级奖励功能,对播客数据进行微调。此外,我们还对我们的提交进行三、九种模型的组合。我们还将播客数据上的BART模型作为我们的基线进行微调。 NIST的人类评估表明,我们提交的最佳版本在EGFB比额表中达到了1.777,而创造者提供的描述评分为1.291。我们的系统在人文和自动评价中赢得了TREC2020年播客轨道上的Podcast Summarization挑战。