With the surging inclination towards carrying out tasks on computational devices and digital mediums, any method that converts a task that was previously carried out manually, to a digitized version, is always welcome. Irrespective of the various documentation tasks that can be done online today, there are still many applications and domains where handwritten text is inevitable, which makes the digitization of handwritten documents a very essential task. Over the past decades, there has been extensive research on offline handwritten text recognition. In the recent past, most of these attempts have shifted to Machine learning and Deep learning based approaches. In order to design more complex and deeper networks, and ensure stellar performances, it is essential to have larger quantities of annotated data. Most of the databases present for offline handwritten text recognition today, have either been manually annotated or semi automatically annotated with a lot of manual involvement. These processes are very time consuming and prone to human errors. To tackle this problem, we present an innovative, complete end-to-end pipeline, that annotates offline handwritten manuscripts written in both print and cursive English, using Deep Learning and User Interaction techniques. This novel method, which involves an architectural combination of a detection system built upon a state-of-the-art text detection model, and a custom made Deep Learning model for the recognition system, is combined with an easy-to-use interactive interface, aiming to improve the accuracy of the detection, segmentation, serialization and recognition phases, in order to ensure high quality annotated data with minimal human interaction.
翻译:随着人们越来越多地倾向于在计算机设备和数字媒体上执行任务,将以前手动执行的任务转换为数字化版本的任何方法都受到欢迎。无论今天可以在线完成哪些文档任务,仍有许多应用程序和领域需要手写文本,这使得手写文档的数字化成为一项非常重要的任务。过去几十年中,对离线手写文本识别进行了广泛的研究。在最近的过去,大部分尝试已经转向基于机器学习和深度学习的方法。为了设计更复杂和更深的网络,并确保卓越的性能,需要大量的注释数据。今天离线手写文本识别的大多数数据库都是手动注释或半自动注释的,需要很多人力投入,这些过程非常耗时且容易发生错误。为解决这个问题,我们提出了一种创新性的、完整的端对端流水线,使用深度学习和用户交互技术注释英文打印体和草书手稿。这种新颖的方法结合了基于最先进的文本检测模型构建的检测系统和自定义的深度学习模型识别系统,再加上易于使用的交互式界面,旨在提高检测、分割、序列化和识别阶段的准确性,以确保高质量的注释数据,同时最大程度地减少人力干预。