Sports game summarization aims at generating sports news from live commentaries. However, existing datasets are all constructed through automated collection and cleaning processes, resulting in a lot of noise. Besides, current works neglect the knowledge gap between live commentaries and sports news, which limits the performance of sports game summarization. In this paper, we introduce K-SportsSum, a new dataset with two characteristics: (1) K-SportsSum collects a large amount of data from massive games. It has 7,854 commentary-news pairs. To improve the quality, K-SportsSum employs a manual cleaning process; (2) Different from existing datasets, to narrow the knowledge gap, K-SportsSum further provides a large-scale knowledge corpus that contains the information of 523 sports teams and 14,724 sports players. Additionally, we also introduce a knowledge-enhanced summarizer that utilizes both live commentaries and the knowledge to generate sports news. Extensive experiments on K-SportsSum and SportsSum datasets show that our model achieves new state-of-the-art performances. Qualitative analysis and human study further verify that our model generates more informative sports news.
翻译:体育游戏总和旨在从现场评论中产生体育新闻。然而,现有的数据集都是通过自动收集和清理程序构建的,造成大量噪音。此外,目前的工作忽略了现场评论和体育新闻之间的知识差距,从而限制了体育游戏总和的性能。在本文中,我们引入了K-SportsSum,这是一个具有两个特点的新数据集:(1)K-SportsSum从大型游戏中收集了大量数据。它有7,854对评论-新闻。为了提高质量,K-SportsSum采用人工清理程序;(2)与现有的数据集不同,为了缩小知识差距,K-SportsSum进一步提供了包含523个体育队和14,724个体育运动员信息的大规模知识库。此外,我们还引入了一个利用现场评论和知识生成体育新闻的知识强化摘要。K-Sportsum和SportSum数据集的广泛实验显示,我们的模型实现了新的状态和艺术模型;(2)与现有数据集不同,以缩小知识差距,K-SportsSum进一步提供了包含523个体育队和14,724个体育运动员的信息。此外,我们还引入了知识强化的分析和人类研究。