OCR 印语合成基准数据集 (OCR Synthetic Benchmark Dataset for Indic Languages) - 专知论文

会员服务 ·

0

光学字符识别 · 模型评估 · MoDELS · Processing（编程语言） · 数据集 ·

2022 年 5 月 5 日

OCR Synthetic Benchmark Dataset for Indic Languages

翻译：OCR 印语合成基准数据集

Naresh Saini,Promodh Pinto,Aravinth Bheemaraj,Deepak Kumar,Dhiraj Daga,Saurabh Yadav,Srihari Nagaraj

We present the largest publicly available synthetic OCR benchmark dataset for Indic languages. The collection contains a total of 90k images and their ground truth for 23 Indic languages. OCR model validation in Indic languages require a good amount of diverse data to be processed in order to create a robust and reliable model. Generating such a huge amount of data would be difficult otherwise but with synthetic data, it becomes far easier. It can be of great importance to fields like Computer Vision or Image Processing where once an initial synthetic data is developed, model creation becomes easier. Generating synthetic data comes with the flexibility to adjust its nature and environment as and when required in order to improve the performance of the model. Accuracy for labeled real-time data is sometimes quite expensive while accuracy for synthetic data can be easily achieved with a good score.

翻译：我们为印度语提供了最大的向公众公开的合成OCR基准数据集。该收集包含总共90k图像,并为23种印度语提供了这些图像的地面真相。印度语的OCR模型验证需要大量不同的数据才能建立可靠和可靠的模型。生成如此大量的数据将很难,但使用合成数据则容易得多。对于计算机视野或图像处理等领域非常重要,这些领域一旦开发了初始合成数据,模型的创建就变得更容易。生成合成数据时需要灵活调整其性质和环境,以便改进模型的性能。标记实时数据的准确性有时会非常昂贵,而合成数据的准确性则可以通过良好的评分很容易实现。

0

相关内容

光学字符识别

光学字符识别

OCR （Optical Character Recognition，光学字符识别）是指电子设备（例如扫描仪或数码相机）检查纸上打印的字符，通过检测暗、亮的模式确定其形状，然后用字符识别方法将形状翻译成计算机文字的过程；即，针对印刷体字符，采用光学的方式将纸质文档中的文字转换成为黑白点阵的图像文件，并通过识别软件将图像中的文字转换成文本格式，供文字处理软件进一步编辑加工的技术。

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新十篇目标跟踪相关论文—多帧光流跟踪、动态图学习、MV-YOLO、姿态估计、深度核相关滤波、Benchmark

【论文推荐】最新十篇目标跟踪相关论文—多帧光流跟踪、动态图学习、MV-YOLO、姿态估计、深度核相关滤波、Benchmark

专知

13+阅读 · 2018年5月26日

两类带导数的非线性Schrodinger方程拟周期解的存在性

国家自然科学基金

0+阅读 · 2015年12月31日

S3AGA样本（Spitzer-SDSS Spectral Atlas of Galaxies and AGNs)及其AGN研究

国家自然科学基金

0+阅读 · 2014年12月31日

可积系统的代数与几何结构

国家自然科学基金

0+阅读 · 2013年12月31日

生长素反应因子ARF2调控类黄酮生物合成的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

粪便中大肠癌相关基因突变体与表达量的数字化微球芯片检测新方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

Dicer在慢性乙型病毒性肝炎恶性转化过程中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

基于Smiles重排串联反应的并三环杂环体系的构建

国家自然科学基金

0+阅读 · 2011年12月31日

柑橘多甲氧基黄酮对大肠杆菌的抑菌分子机制与定量构-效关系研究

国家自然科学基金

0+阅读 · 2011年12月31日

单根多楔带附件驱动系统动态特性设计方法的研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于全极化SAR的台风灾害损失森林和房屋的定量评估系统研究

国家自然科学基金

0+阅读 · 2009年12月31日

GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

Arxiv

0+阅读 · 2022年6月24日

Benchmark computations for the polarization tensor of small conducting objects

Arxiv

0+阅读 · 2022年6月23日

Complementary datasets to COCO for object detection

Arxiv

0+阅读 · 2022年6月23日

Explore Spatio-temporal Aggregation for Insubstantial Object Detection: Benchmark Dataset and Baseline

Arxiv

0+阅读 · 2022年6月23日

Making Generated Images Hard To Spot: A Transferable Attack On Synthetic Image Detectors

Arxiv

0+阅读 · 2022年6月22日

Parallel Pre-trained Transformers (PPT) for Synthetic Data-based Instance Segmentation

Arxiv

0+阅读 · 2022年6月22日

ExSum: From Local Explanations to Model Understanding

Arxiv

13+阅读 · 2022年4月30日

Cross-Modal Object Tracking: Modality-Aware Representations and A Unified Benchmark

Arxiv

14+阅读 · 2021年11月11日

Attention U-Net: Learning Where to Look for the Pancreas

Arxiv

17+阅读 · 2018年5月20日

DOTA: A Large-scale Dataset for Object Detection in Aerial Images

Arxiv

19+阅读 · 2018年1月27日

VIP会员

文章信息

相关主题

光学字符识别

Processing（编程语言）

相关VIP内容

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

小规模训练指南：打造世界级大语言模型的关键方法

无人机编队飞行：复杂环境中作战的策略、挑战与应用

大模型APP，AI时代第一个爆款

从数据中心视角出发的高效大语言模型训练综述

相关资讯

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新十篇目标跟踪相关论文—多帧光流跟踪、动态图学习、MV-YOLO、姿态估计、深度核相关滤波、Benchmark

【论文推荐】最新十篇目标跟踪相关论文—多帧光流跟踪、动态图学习、MV-YOLO、姿态估计、深度核相关滤波、Benchmark

专知

13+阅读 · 2018年5月26日

相关论文

GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

Arxiv

0+阅读 · 2022年6月24日

Benchmark computations for the polarization tensor of small conducting objects

Arxiv

0+阅读 · 2022年6月23日

Complementary datasets to COCO for object detection

Arxiv

0+阅读 · 2022年6月23日

Explore Spatio-temporal Aggregation for Insubstantial Object Detection: Benchmark Dataset and Baseline

Arxiv

0+阅读 · 2022年6月23日

Making Generated Images Hard To Spot: A Transferable Attack On Synthetic Image Detectors

Arxiv

0+阅读 · 2022年6月22日

Parallel Pre-trained Transformers (PPT) for Synthetic Data-based Instance Segmentation

Arxiv

0+阅读 · 2022年6月22日

ExSum: From Local Explanations to Model Understanding

Arxiv

13+阅读 · 2022年4月30日

Cross-Modal Object Tracking: Modality-Aware Representations and A Unified Benchmark

Arxiv

14+阅读 · 2021年11月11日

Attention U-Net: Learning Where to Look for the Pancreas

Arxiv

17+阅读 · 2018年5月20日

DOTA: A Large-scale Dataset for Object Detection in Aerial Images

Arxiv

19+阅读 · 2018年1月27日

相关基金

两类带导数的非线性Schrodinger方程拟周期解的存在性

国家自然科学基金

0+阅读 · 2015年12月31日

S3AGA样本（Spitzer-SDSS Spectral Atlas of Galaxies and AGNs)及其AGN研究

国家自然科学基金

0+阅读 · 2014年12月31日

可积系统的代数与几何结构

国家自然科学基金

0+阅读 · 2013年12月31日

生长素反应因子ARF2调控类黄酮生物合成的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

粪便中大肠癌相关基因突变体与表达量的数字化微球芯片检测新方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

Dicer在慢性乙型病毒性肝炎恶性转化过程中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

基于Smiles重排串联反应的并三环杂环体系的构建

国家自然科学基金

0+阅读 · 2011年12月31日

柑橘多甲氧基黄酮对大肠杆菌的抑菌分子机制与定量构-效关系研究

国家自然科学基金

0+阅读 · 2011年12月31日

单根多楔带附件驱动系统动态特性设计方法的研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于全极化SAR的台风灾害损失森林和房屋的定量评估系统研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员