Perceiving text is crucial to understand semantics of outdoor scenes and hence is a critical requirement to build intelligent systems for driver assistance and self-driving. Most of the existing datasets for text detection and recognition comprise still images and are mostly compiled keeping text in mind. This paper introduces a new "RoadText-1K" dataset for text in driving videos. The dataset is 20 times larger than the existing largest dataset for text in videos. Our dataset comprises 1000 video clips of driving without any bias towards text and with annotations for text bounding boxes and transcriptions in every frame. State of the art methods for text detection, recognition and tracking are evaluated on the new dataset and the results signify the challenges in unconstrained driving videos compared to existing datasets. This suggests that RoadText-1K is suited for research and development of reading systems, robust enough to be incorporated into more complex downstream tasks like driver assistance and self-driving. The dataset can be found at http://cvit.iiit.ac.in/research/projects/cvit-projects/roadtext-1k
翻译:感知文本对于理解户外场景的语义至关重要,因此对于建立智能的驱动器协助和自我驱动系统至关重要。现有的文本检测和识别数据集大多由静态图像组成,而且大多是按文字来编译的。本文为驱动视频中的文本引入了新的“ROadText-1K”数据集。数据集比现有最大视频文本数据集大20倍。我们的数据集包含1000个驾驶视频片段,这些视频片段对文本没有任何偏见,还有每个框中文本捆绑框和转录的说明。在新的数据集中,对文本检测、识别和跟踪的艺术方法进行了评估,结果表明与现有数据集相比,在未受控制驱动的驱动视频中存在挑战。这表明,RoadText-1K适合对阅读系统的研究和开发,足够强,足以纳入更复杂的下游任务,如驱动器协助和自我驱动器驱动器。数据集可在http://cvit.iiit.ac. in/research/cvit-production-1k找到。