Scalable Mask Annotation for Video Text Spotting - 专知论文

会员服务 ·

0

掩码 · 数据集 · 可辨认的 · HTTPS · 统计量 ·

2023 年 5 月 2 日

Scalable Mask Annotation for Video Text Spotting

翻译：暂无翻译

Haibin He,Jing Zhang,Mengyang Xu,Juhua Liu,Bo Du,Dacheng Tao

from arxiv, Technical report. Work in progress

Video text spotting refers to localizing, recognizing, and tracking textual elements such as captions, logos, license plates, signs, and other forms of text within consecutive video frames. However, current datasets available for this task rely on quadrilateral ground truth annotations, which may result in including excessive background content and inaccurate text boundaries. Furthermore, methods trained on these datasets often produce prediction results in the form of quadrilateral boxes, which limits their ability to handle complex scenarios such as dense or curved text. To address these issues, we propose a scalable mask annotation pipeline called SAMText for video text spotting. SAMText leverages the SAM model to generate mask annotations for scene text images or video frames at scale. Using SAMText, we have created a large-scale dataset, SAMText-9M, that contains over 2,400 video clips sourced from existing datasets and over 9 million mask annotations. We have also conducted a thorough statistical analysis of the generated masks and their quality, identifying several research topics that could be further explored based on this dataset. The code and dataset will be released at \url{https://github.com/ViTAE-Transformer/SAMText}.

翻译：暂无翻译

1

相关内容

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

2型糖尿病患者糖化白蛋白水平与糖尿病视网膜病变发生风险的关联性研究及机制探讨

国家自然科学基金

0+阅读 · 2015年12月31日

miRNAs/mTOR调控网络在糖尿病雷帕霉素抵抗中的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

MiR-27a/b靶向沉默ABCA1调控胆固醇逆向转运

国家自然科学基金

0+阅读 · 2011年12月31日

掺Bi钨酸盐单晶体及近红外宽带发光的研究

国家自然科学基金

0+阅读 · 2009年12月31日

用于兰州HIRFL－CSR内外靶实验飞行时间探测器的多气隙电阻板室研制

国家自然科学基金

0+阅读 · 2009年12月31日

Self-Supervised Video Similarity Learning

Arxiv

0+阅读 · 2023年6月16日

MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning

Arxiv

0+阅读 · 2023年6月14日

Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation

Arxiv

11+阅读 · 2021年12月16日

SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning

Arxiv

11+阅读 · 2021年12月16日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

VIP会员

文章信息

相关主题

相关VIP内容

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

智能体工程（Agent Engineering）

《全球地缘政治环境中的反无人机系统互操作性》252页

专业软件开发者不靠“氛围编程”（Vibe Coding），而靠“控制”：2025 年 AI Agent 在编程中的应用研究

基于大语言模型的智能体化软件问题解决：综述

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

Self-Supervised Video Similarity Learning

Arxiv

0+阅读 · 2023年6月16日

MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning

Arxiv

0+阅读 · 2023年6月14日

Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation

Arxiv

11+阅读 · 2021年12月16日

SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning

Arxiv

11+阅读 · 2021年12月16日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

相关基金

2型糖尿病患者糖化白蛋白水平与糖尿病视网膜病变发生风险的关联性研究及机制探讨

国家自然科学基金

0+阅读 · 2015年12月31日

miRNAs/mTOR调控网络在糖尿病雷帕霉素抵抗中的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

MiR-27a/b靶向沉默ABCA1调控胆固醇逆向转运

国家自然科学基金

0+阅读 · 2011年12月31日

掺Bi钨酸盐单晶体及近红外宽带发光的研究

国家自然科学基金

0+阅读 · 2009年12月31日

用于兰州HIRFL－CSR内外靶实验飞行时间探测器的多气隙电阻板室研制

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员