Understanding broadcast videos is a challenging task in computer vision, as it requires generic reasoning capabilities to appreciate the content offered by the video editing. In this work, we propose SoccerNet-v2, a novel large-scale corpus of manual annotations for the SoccerNet video dataset, along with open challenges to encourage more research in soccer understanding and broadcast production. Specifically, we release around 300k annotations within SoccerNet's 500 untrimmed broadcast soccer videos. We extend current tasks in the realm of soccer to include action spotting, camera shot segmentation with boundary detection, and we define a novel replay grounding task. For each task, we provide and discuss benchmark results, reproducible with our open-source adapted implementations of the most relevant works in the field. SoccerNet-v2 is presented to the broader research community to help push computer vision closer to automatic solutions for more general video understanding and production purposes.
翻译:理解广播视频是计算机愿景中一项艰巨的任务,因为它需要通用的推理能力来理解视频编辑提供的内容。 在这项工作中,我们建议SoccerNet-v2为SoccerNet视频数据集提供新的大规模人工说明,同时提出公开挑战以鼓励更多足球理解和广播制作方面的研究。具体地说,我们在SoccerNet的500个未剪辑的广播足球视频中发布了大约300k个注释。我们扩大了足球领域的当前任务,包括行动定位、带边界探测的摄像片分割,以及我们定义了一个新的重播地面任务。对于每一项任务,我们提供并讨论基准结果,与我们根据公开来源调整的实地最相关作品的实施相复制。 SocerNet-v2被介绍给更广泛的研究界,以帮助计算机的视野更接近于自动解决方案,以便更普遍的视频理解和制作目的。