Sports game summarization aims to generate sports news based on real-time commentaries. The task has attracted wide research attention but is still under-explored probably due to the lack of corresponding English datasets. Therefore, in this paper, we release GOAL, the first English sports game summarization dataset. Specifically, there are 103 commentary-news pairs in GOAL, where the average lengths of commentaries and news are 2724.9 and 476.3 words, respectively. Moreover, to support the research in the semi-supervised setting, GOAL additionally provides 2,160 unlabeled commentary documents. Based on our GOAL, we build and evaluate several baselines, including extractive and abstractive baselines. The experimental results show the challenges of this task still remain. We hope our work could promote the research of sports game summarization. The dataset has been released at https://github.com/krystalan/goal.
翻译:体育游戏总结旨在以实时评论为基础制作体育新闻。任务吸引了广泛的研究关注,但可能由于缺乏相应的英国数据集而仍未得到充分探讨。因此,在本文件中,我们发布了首个英国体育游戏总结数据集GOAL。具体地说,GOAL有103个评论-新闻对,评论和新闻的平均长度分别为2724.9和476.3字。此外,为支持半监督环境中的研究,GOAL还提供了2 160个未标注的评论文件。根据我们的GOAL,我们建立和评估了若干基线,包括采掘和抽象基线。实验结果显示这项任务的挑战依然存在。我们希望我们的工作能够促进体育游戏总结研究。数据集已在https://github.com/krystalan/goals上发布。