中国传统合成数据集,用标签数据校验,以识别场景文字 (Traditional Chinese Synthetic Datasets Verified with Labeled Data for Scene Text Recognition) - 专知论文

会员服务 ·

0

标注 · MoDELS · 数据集 · 数据标签 · Better ·

2021 年 11 月 26 日

Traditional Chinese Synthetic Datasets Verified with Labeled Data for Scene Text Recognition

翻译：中国传统合成数据集,用标签数据校验,以识别场景文字

Yi-Chang Chen,Yu-Chuan Chang,Yen-Cheng Chang,Yi-Ren Yeh

Scene text recognition (STR) has been widely studied in academia and industry. Training a text recognition model often requires a large amount of labeled data, but data labeling can be difficult, expensive, or time-consuming, especially for Traditional Chinese text recognition. To the best of our knowledge, public datasets for Traditional Chinese text recognition are lacking. This paper presents a framework for a Traditional Chinese synthetic data engine which aims to improve text recognition model performance. We generated over 20 million synthetic data and collected over 7,000 manually labeled data TC-STR 7k-word as the benchmark. Experimental results show that a text recognition model can achieve much better accuracy either by training from scratch with our generated synthetic data or by further fine-tuning with TC-STR 7k-word.

翻译：培训文本识别模型往往需要大量贴标签的数据,但数据标识可能困难、昂贵或费时,特别是传统的中文文本识别。据我们所知,缺少用于传统中文文本识别的公共数据集。本文为传统的中国合成数据引擎提供了一个框架,目的是改进文本识别模型的性能。我们生成了2 000多万个合成数据,并收集了7 000多个手工贴标签的数据(TC-STR 7k字)作为基准。实验结果显示,文本识别模型可以通过从零到零地培训我们生成的合成数据,或者通过进一步调整TC-STR 7k字来实现更高的准确性。

0

相关内容

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【AAAI2020-清华大学】张量图卷积网络文本分类，Tensor Graph Convolutional Networks for Text Classification

【AAAI2020-清华大学】张量图卷积网络文本分类，Tensor Graph Convolutional Networks for Text Classification

专知会员服务

76+阅读 · 2020年1月16日

【ICDAR2019教程】用于文档分析、文本识别和语言建模的深度学习（Deep Learning for Document Analysis, Text Recognition, and Language Modeling）

【ICDAR2019教程】用于文档分析、文本识别和语言建模的深度学习（Deep Learning for Document Analysis, Text Recognition, and Language Modeling）

专知会员服务

22+阅读 · 2019年12月12日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

学术会议 | 知识图谱顶会 ISWC 征稿：Poster/Demo

学术会议 | 知识图谱顶会 ISWC 征稿：Poster/Demo

开放知识图谱

5+阅读 · 2019年4月16日

Awesome-Chinese-NLP：中文自然语言处理相关资料

Awesome-Chinese-NLP：中文自然语言处理相关资料

AINLP

30+阅读 · 2019年2月17日

TCN v2 + 3Dconv 运动信息

TCN v2 + 3Dconv 运动信息

CreateAMind

4+阅读 · 2019年1月8日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

已删除

将门创投

3+阅读 · 2018年8月21日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

MATCH: Metadata-Aware Text Classification in A Large Hierarchy

Arxiv

12+阅读 · 2021年2月15日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

FocalMix: Semi-Supervised Learning for 3D Medical Image Detection

FocalMix: Semi-Supervised Learning for 3D Medical Image Detection

Arxiv

10+阅读 · 2020年3月20日

SwapText: Image Based Texts Transfer in Scenes

SwapText: Image Based Texts Transfer in Scenes

Arxiv

4+阅读 · 2020年3月18日

Scene Text Detection and Recognition: The Deep Learning Era

Scene Text Detection and Recognition: The Deep Learning Era

Arxiv

27+阅读 · 2019年9月5日

Pre-Training with Whole Word Masking for Chinese BERT

Arxiv

11+阅读 · 2019年6月19日

Weakly-Supervised Hierarchical Text Classification

Arxiv

3+阅读 · 2018年12月29日

Adversarial Meta-Learning

Arxiv

7+阅读 · 2018年6月8日

Train Once, Test Anywhere: Zero-Shot Learning for Text Classification

Arxiv

4+阅读 · 2017年12月23日

CNN-RNN: A Unified Framework for Multi-label Image Classification

Arxiv

7+阅读 · 2016年4月15日

VIP会员

文章信息

相关主题

相关VIP内容

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【AAAI2020-清华大学】张量图卷积网络文本分类，Tensor Graph Convolutional Networks for Text Classification

【AAAI2020-清华大学】张量图卷积网络文本分类，Tensor Graph Convolutional Networks for Text Classification

专知会员服务

76+阅读 · 2020年1月16日

【ICDAR2019教程】用于文档分析、文本识别和语言建模的深度学习（Deep Learning for Document Analysis, Text Recognition, and Language Modeling）

【ICDAR2019教程】用于文档分析、文本识别和语言建模的深度学习（Deep Learning for Document Analysis, Text Recognition, and Language Modeling）

专知会员服务

22+阅读 · 2019年12月12日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】在低维与高维空间中对潜在表征的分析、建模与变换

《美军使用大语言模型技术生成领域特定文档》2025最新379页

【NeurIPS 2025】以语言为中心的全模态表征学习的可扩展性研究

智能体化多模态大语言模型综述

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

学术会议 | 知识图谱顶会 ISWC 征稿：Poster/Demo

学术会议 | 知识图谱顶会 ISWC 征稿：Poster/Demo

开放知识图谱

5+阅读 · 2019年4月16日

Awesome-Chinese-NLP：中文自然语言处理相关资料

Awesome-Chinese-NLP：中文自然语言处理相关资料

AINLP

30+阅读 · 2019年2月17日

TCN v2 + 3Dconv 运动信息

TCN v2 + 3Dconv 运动信息

CreateAMind

4+阅读 · 2019年1月8日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

已删除

将门创投

3+阅读 · 2018年8月21日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

相关论文

MATCH: Metadata-Aware Text Classification in A Large Hierarchy

Arxiv

12+阅读 · 2021年2月15日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

FocalMix: Semi-Supervised Learning for 3D Medical Image Detection

FocalMix: Semi-Supervised Learning for 3D Medical Image Detection

Arxiv

10+阅读 · 2020年3月20日

SwapText: Image Based Texts Transfer in Scenes

SwapText: Image Based Texts Transfer in Scenes

Arxiv

4+阅读 · 2020年3月18日

Scene Text Detection and Recognition: The Deep Learning Era

Scene Text Detection and Recognition: The Deep Learning Era

Arxiv

27+阅读 · 2019年9月5日

Pre-Training with Whole Word Masking for Chinese BERT

Arxiv

11+阅读 · 2019年6月19日

Weakly-Supervised Hierarchical Text Classification

Arxiv

3+阅读 · 2018年12月29日

Adversarial Meta-Learning

Arxiv

7+阅读 · 2018年6月8日

Train Once, Test Anywhere: Zero-Shot Learning for Text Classification

Arxiv

4+阅读 · 2017年12月23日

CNN-RNN: A Unified Framework for Multi-label Image Classification

Arxiv

7+阅读 · 2016年4月15日

微信扫码咨询专知VIP会员