带文本的房间:用于覆盖文本检测的数据集 (Rooms with Text: A Dataset for Overlaying Text Detection)

In this paper, we introduce a new dataset of room interior pictures with overlaying and scene text, totalling to 4836 annotated images in 25 product categories. We provide details on the collection and annotation process of our dataset, and analyze its statistics. Furthermore, we propose a baseline method for overlaying text detection, that leverages the character region-aware text detection framework to guide the classification model. We validate our approach and show its efficiency in terms of binary classification metrics, reaching the final performance of 0.95 F1 score, with false positive and false negative rates of 0.02 and 0.06 correspondingly.

翻译：在本文中,我们引入了一套新的室内室内照片数据集,内含重叠和场景文本,总共在25个产品类别中提供了4836张附加说明的图像,我们详细介绍了我们数据集的收集和注释过程,并分析了其统计数据。此外,我们提出了一种覆盖文本检测基线方法,利用字符区域识别文本检测框架来指导分类模式。我们验证了我们的方法,并显示了其二元分类指标的效率,达到了0.95 F1分的最后性能,正值和负值分别为0.02和0.06,相应的负值为0.02和0.06。

相关内容

数据集

关注 0

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日