EraseNet:一个经常性的受监督文件清理残余网络 (EraseNet: A Recurrent Residual Network for Supervised Document Cleaning) - 专知论文

会员服务 ·

0

残差网络 · OCR · Networking · 监督 · 噪声 ·

2022 年 10 月 3 日

EraseNet: A Recurrent Residual Network for Supervised Document Cleaning

翻译：EraseNet:一个经常性的受监督文件清理残余网络

Yashowardhan Shinde,Kishore Kulkarni

from arxiv, 10 pages, 5 figures, attempting for publication in International Journal on Document Analysis and Recognition (IJDAR)

Document denoising is considered one of the most challenging tasks in computer vision. There exist millions of documents that are still to be digitized, but problems like document degradation due to natural and man-made factors make this task very difficult. This paper introduces a supervised approach for cleaning dirty documents using a new fully convolutional auto-encoder architecture. This paper focuses on restoring documents with discrepancies like deformities caused due to aging of a document, creases left on the pages that were xeroxed, random black patches, lightly visible text, etc., and also improving the quality of the image for better optical character recognition system (OCR) performance. Removing noise from scanned documents is a very important step before the documents as this noise can severely affect the performance of an OCR system. The experiments in this paper have shown promising results as the model is able to learn a variety of ordinary as well as unusual noises and rectify them efficiently.

翻译：文档拆解被认为是计算机视觉中最具挑战性的任务之一。有数百万个文件仍有待数字化,但像文件因自然和人为因素而退化这样的问题使得这项工作非常困难。本文介绍了使用新的全进化自动编码结构来清洁脏文件的监管方法。本文侧重于修复文件,其差异如文件老化、被破碎的页面上的折痕、随机黑斑、轻可见的文字等等,以及提高图像质量,以便改进光学字符识别系统(OCR)的性能。从扫描文件中去除噪音是文件之前的一个非常重要的步骤,因为这种噪音会严重影响OCR系统的性能。本文的实验显示,由于模型能够学习各种普通的和不寻常的噪音,并有效地加以纠正,结果令人乐观。

0

相关内容

残差网络

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

重离子储存环CSRe上激光冷却相对论能量类锂12C3+离子束的实验研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

近、中红外波段含Sb DWELL结构激光器研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向多时相腹部CT图像的多器官计算机辅助诊断关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

RNA结合蛋白CUG-BP1在肌肉萎缩中的生物学功能与机制

国家自然科学基金

0+阅读 · 2012年12月31日

MRTF-A调控CYR61介导间充质干细胞向内皮细胞分化的分子机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

面向会议评审的无线自组网络系统研究

国家自然科学基金

0+阅读 · 2011年8月31日

转录因子AP-2α22312;UVB诱发皮肤癌中的作用和机制研究

国家自然科学基金

0+阅读 · 2010年12月31日

耐辐射球菌DNA损伤修复蛋白质RecQ的HRDC结构域结构和功能研究

国家自然科学基金

0+阅读 · 2009年12月31日

磁性Pickering乳液界面流变学研究

国家自然科学基金

0+阅读 · 2008年12月31日

Zero-Error Coding for Computing with Encoder Side-Information

Arxiv

0+阅读 · 2022年11月7日

Computing and Exploiting Document Structure to Improve Unsupervised Extractive Summarization of Legal Case Decisions

Arxiv

0+阅读 · 2022年11月6日

Hierarchical Multi-Label Classification of Scientific Documents

Arxiv

0+阅读 · 2022年11月5日

Fine-tuned Generative Adversarial Network-based Model for Medical Images Super-Resolution

Arxiv

0+阅读 · 2022年11月3日

Pairing optimization via statistics: Algebraic structure in pairing problems and its application to performance enhancement

Arxiv

0+阅读 · 2022年11月3日

Scaling Properties of Deep Residual Networks

Arxiv

13+阅读 · 2021年5月25日

Adaptive Attentional Network for Few-Shot Knowledge Graph Completion

Arxiv

17+阅读 · 2020年10月19日

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Arxiv

12+阅读 · 2020年2月19日

nnU-Net: Self-adapting Framework for U-Net-Based Medical Image Segmentation

Arxiv

12+阅读 · 2018年9月27日

Order-Free RNN with Visual Attention for Multi-Label Classification

Arxiv

16+阅读 · 2017年12月20日

VIP会员

文章信息

相关主题

相关VIP内容

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【NeurIPS 2025】视觉指令瓶颈微调

什么是模块化开放系统方法（MOSA）？从美陆军新型倾转旋翼机视角解读

【牛津博士论文】面向视觉、物理与语言应用的可信机器学习模型

医学领域大型语言模型的新进展

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Zero-Error Coding for Computing with Encoder Side-Information

Arxiv

0+阅读 · 2022年11月7日

Computing and Exploiting Document Structure to Improve Unsupervised Extractive Summarization of Legal Case Decisions

Arxiv

0+阅读 · 2022年11月6日

Hierarchical Multi-Label Classification of Scientific Documents

Arxiv

0+阅读 · 2022年11月5日

Fine-tuned Generative Adversarial Network-based Model for Medical Images Super-Resolution

Arxiv

0+阅读 · 2022年11月3日

Pairing optimization via statistics: Algebraic structure in pairing problems and its application to performance enhancement

Arxiv

0+阅读 · 2022年11月3日

Scaling Properties of Deep Residual Networks

Arxiv

13+阅读 · 2021年5月25日

Adaptive Attentional Network for Few-Shot Knowledge Graph Completion

Arxiv

17+阅读 · 2020年10月19日

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Arxiv

12+阅读 · 2020年2月19日

nnU-Net: Self-adapting Framework for U-Net-Based Medical Image Segmentation

Arxiv

12+阅读 · 2018年9月27日

Order-Free RNN with Visual Attention for Multi-Label Classification

Arxiv

16+阅读 · 2017年12月20日

相关基金

重离子储存环CSRe上激光冷却相对论能量类锂12C3+离子束的实验研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

近、中红外波段含Sb DWELL结构激光器研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向多时相腹部CT图像的多器官计算机辅助诊断关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

RNA结合蛋白CUG-BP1在肌肉萎缩中的生物学功能与机制

国家自然科学基金

0+阅读 · 2012年12月31日

MRTF-A调控CYR61介导间充质干细胞向内皮细胞分化的分子机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

面向会议评审的无线自组网络系统研究

国家自然科学基金

0+阅读 · 2011年8月31日

转录因子AP-2α22312;UVB诱发皮肤癌中的作用和机制研究

国家自然科学基金

0+阅读 · 2010年12月31日

耐辐射球菌DNA损伤修复蛋白质RecQ的HRDC结构域结构和功能研究

国家自然科学基金

0+阅读 · 2009年12月31日

磁性Pickering乳液界面流变学研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员