Recently, video text detection, tracking, and recognition in natural scenes are becoming very popular in the computer vision community. However, most existing algorithms and benchmarks focus on common text cases (e.g., normal size, density) and single scenarios, while ignoring extreme video text challenges, i.e., dense and small text in various scenarios. In this competition report, we establish a video text reading benchmark, DSText, which focuses on dense and small text reading challenges in the video with various scenarios. Compared with the previous datasets, the proposed dataset mainly include three new challenges: 1) Dense video texts, a new challenge for video text spotter. 2) High-proportioned small texts. 3) Various new scenarios, e.g., Game, sports, etc. The proposed DSText includes 100 video clips from 12 open scenarios, supporting two tasks (i.e., video text tracking (Task 1) and end-to-end video text spotting (Task 2)). During the competition period (opened on 15th February 2023 and closed on 20th March 2023), a total of 24 teams participated in the three proposed tasks with around 30 valid submissions, respectively. In this article, we describe detailed statistical information of the dataset, tasks, evaluation protocols and the results summaries of the ICDAR 2023 on DSText competition. Moreover, we hope the benchmark will promise video text research in the community.
翻译:近年来,计算机视觉领域中的视频文本检测、跟踪和识别在自然场景中变得越来越受欢迎。然而,大多数现有的算法和基准都专注于常规的文本案例(例如普通大小和密度)和单个场景,而忽略了极端的视频文本挑战,即各种场景中的密集小文本。在本次比赛报告中,我们建立了一个视频文本阅读基准,DSText,重点解决视频中密集小文本的阅读挑战,包括不同场景。与之前的数据集相比,提出的数据集主要包括三个新挑战:1) 密集视频文本,这是视频文本识别器的新挑战。2) 高比例小文本。3) 各种新场景,例如游戏,体育等。所提出的DSText包括来自12个开放场景的100个视频剪辑,支持两个任务(即视频文本跟踪(任务1)和端到端视频文本识别(任务2))。在比赛期间(于2023年2月15日开放,于2023年3月20日关闭),共有24个团队分别参加了三个提出的任务,共提交了30个有效提交。在本文中,我们描述了数据集,任务,评估协议以及ICDAR 2023 DSText比赛的结果总结的详细统计信息。此外,我们希望基准能够促进社区中的视频文本研究。