Soccer is more than just a game - it is a passion that transcends borders and unites people worldwide. From the roar of the crowds to the excitement of the commentators, every moment of a soccer match is a thrill. Yet, with so many games happening simultaneously, fans cannot watch them all live. Notifications for main actions can help, but lack the engagement of live commentary, leaving fans feeling disconnected. To fulfill this need, we propose in this paper a novel task of dense video captioning focusing on the generation of textual commentaries anchored with single timestamps. To support this task, we additionally present a challenging dataset consisting of almost 37k timestamped commentaries across 715.9 hours of soccer broadcast videos. Additionally, we propose a first benchmark and baseline for this task, highlighting the difficulty of temporally anchoring commentaries yet showing the capacity to generate meaningful commentaries. By providing broadcasters with a tool to summarize the content of their video with the same level of engagement as a live game, our method could help satisfy the needs of the numerous fans who follow their team but cannot necessarily watch the live game. We believe our method has the potential to enhance the accessibility and understanding of soccer content for a wider audience, bringing the excitement of the game to more people.
翻译:足球不仅仅是一个游戏 - 它是一种跨越国界和团结全球人民的热情。从人群的欢呼声到评论员的兴奋,每一个足球比赛的瞬间都是一个刺激。然而,由于同时进行许多比赛,粉丝们无法全部现场观看。主要动作的通知虽然有所帮助,但缺乏现场评论的参与感,让粉丝们感到疏离。为了满足这个需求,我们在本文中提出了一种新颖的密集视频字幕生成任务,重点是生成与单个时间戳锚定的文本解说。为了支持这个任务,我们还提供了一个具有艰巨挑战性的数据集,包括715.9小时的足球转播视频中近37k个时间戳的解说。此外,我们为这个任务提出了第一个基准和基线,突出了时间锚定解说的困难,但也展示了生成有意义的解说的能力。通过向广播员提供一种工具来概述他们的视频,与现场比赛相同的参与感,我们的方法可以帮助满足许多跟随自己的队伍但不能实时观看比赛的粉丝的需求。我们相信我们的方法有潜力将足球内容的可访问性和理解性增强到更广泛的受众中,将比赛的兴奋带给更多人。