解决韩文OCR中阶级不平衡问题的性能分解 (Character decomposition to resolve class imbalance problem in Hangul OCR) - 专知论文

会员服务 ·

0

OCR · Performer · 类别 · 讲稿 · Neural Networks ·

2022 年 8 月 12 日

Character decomposition to resolve class imbalance problem in Hangul OCR

翻译：解决韩文OCR中阶级不平衡问题的性能分解

Geonuk Kim,Jaemin Son,Kanghyu Lee,Jaesik Min

from arxiv, ECCV 2022 TiE workshop

We present a novel approach to OCR(Optical Character Recognition) of Korean character, Hangul. As a phonogram, Hangul can represent 11,172 different characters with only 52 graphemes, by describing each character with a combination of the graphemes. As the total number of the characters could overwhelm the capacity of a neural network, the existing OCR encoding methods pre-define a smaller set of characters that are frequently used. This design choice naturally compromises the performance on long-tailed characters in the distribution. In this work, we demonstrate that grapheme encoding is not only efficient but also performant for Hangul OCR. Benchmark tests show that our approach resolves two main problems of Hangul OCR: class imbalance and target class selection.

翻译：我们展示了韩国字符 OCR( 证人字符识别) 的新型方法, 韩文。韩文作为录音片, 韩文可以代表11, 172个不同的字符, 只有52个图形模型, 以图形模型组合的方式描述每个字符。由于字符的总数可能超过神经网络的能力, 现有的OCR编码方法预设了通常使用的更小的字符组。此设计选择自然会影响发行中长尾字符的性能。在这项工作中, 我们证明图形编码不仅有效, 而且还能表现汉文 OCR。基准测试显示, 我们的方法解决了韩文OCR的两个主要问题: 阶级不平衡和目标类选择。

0

相关内容

OCR

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

染色质重构蛋白CHR5在拟南芥抗病免疫反应中的功能研究

国家自然科学基金

0+阅读 · 2015年12月31日

结缔组织生长因子（CTGF)在再生障碍性贫血发病机制中的作用研究

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

CRL4泛素化酶在染色质重塑和DNA同源重组修复过程中的功能研究

国家自然科学基金

0+阅读 · 2014年12月31日

microRNA-224在牙釉质发育中的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

MACF1通过Wnt/β-catenin信号通路参与模拟失重抑制成骨细胞分化的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

水莱茵海默氏菌 (Rheinheimera aquimaris) 淬灭细菌群体感应的机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

夏季中尺度强降水天气系统的可预报性研究

国家自然科学基金

0+阅读 · 2012年12月31日

斑马鱼β-catenin核转运的调控机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

IPTV视频质量的多源特征提取与融合评价模型研究

国家自然科学基金

0+阅读 · 2009年12月31日

Scaling up Stochastic Gradient Descent for Non-convex Optimisation

Arxiv

0+阅读 · 2022年10月6日

Text Characterization Toolkit

Arxiv

0+阅读 · 2022年10月4日

Shape Completion with Points in the Shadow

Arxiv

0+阅读 · 2022年10月4日

CLINICAL: Targeted Active Learning for Imbalanced Medical Image Classification

Arxiv

0+阅读 · 2022年10月4日

Structural Estimation of Markov Decision Processes in High-Dimensional State Space with Finite-Time Guarantees

Arxiv

0+阅读 · 2022年10月4日

A Method for Discovering Novel Classes in Tabular Data

Arxiv

0+阅读 · 2022年10月3日

Building for Tomorrow: Assessing the Temporal Persistence of Text Classifiers

Arxiv

0+阅读 · 2022年10月1日

Natural Language Descriptions of Deep Visual Features

Arxiv

12+阅读 · 2022年1月26日

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Arxiv

17+阅读 · 2019年9月9日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

VIP会员

文章信息

相关主题

Neural Networks

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

检索增强生成（RAG）技术，261页slides

美联参会指南-联合规划与执行概述及政策框架 | 32页

从DeepSeek-R1学到的三个核心经验

大规模视觉模型中的提示式适配：综述

相关资讯

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

相关论文

Scaling up Stochastic Gradient Descent for Non-convex Optimisation

Arxiv

0+阅读 · 2022年10月6日

Text Characterization Toolkit

Arxiv

0+阅读 · 2022年10月4日

Shape Completion with Points in the Shadow

Arxiv

0+阅读 · 2022年10月4日

CLINICAL: Targeted Active Learning for Imbalanced Medical Image Classification

Arxiv

0+阅读 · 2022年10月4日

Structural Estimation of Markov Decision Processes in High-Dimensional State Space with Finite-Time Guarantees

Arxiv

0+阅读 · 2022年10月4日

A Method for Discovering Novel Classes in Tabular Data

Arxiv

0+阅读 · 2022年10月3日

Building for Tomorrow: Assessing the Temporal Persistence of Text Classifiers

Arxiv

0+阅读 · 2022年10月1日

Natural Language Descriptions of Deep Visual Features

Arxiv

12+阅读 · 2022年1月26日

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Arxiv

17+阅读 · 2019年9月9日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

相关基金

染色质重构蛋白CHR5在拟南芥抗病免疫反应中的功能研究

国家自然科学基金

0+阅读 · 2015年12月31日

结缔组织生长因子（CTGF)在再生障碍性贫血发病机制中的作用研究

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

CRL4泛素化酶在染色质重塑和DNA同源重组修复过程中的功能研究

国家自然科学基金

0+阅读 · 2014年12月31日

microRNA-224在牙釉质发育中的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

MACF1通过Wnt/β-catenin信号通路参与模拟失重抑制成骨细胞分化的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

水莱茵海默氏菌 (Rheinheimera aquimaris) 淬灭细菌群体感应的机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

夏季中尺度强降水天气系统的可预报性研究

国家自然科学基金

0+阅读 · 2012年12月31日

斑马鱼β-catenin核转运的调控机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

IPTV视频质量的多源特征提取与融合评价模型研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员