HTR模式培训的挑战:数字数字管理项目对Donner le gout的反馈</s> (The Challenges of HTR Model Training: Feedback from the Project Donner le gout de l'archive a l'ere numerique)

The arrival of handwriting recognition technologies offers new possibilities for research in heritage studies. However, it is now necessary to reflect on the experiences and the practices developed by research teams. Our use of the Transkribus platform since 2018 has led us to search for the most significant ways to improve the performance of our handwritten text recognition (HTR) models which are made to transcribe French handwriting dating from the 17th century. This article therefore reports on the impacts of creating transcribing protocols, using the language model at full scale and determining the best way to use base models in order to help increase the performance of HTR models. Combining all of these elements can indeed increase the performance of a single model by more than 20% (reaching a Character Error Rate below 5%). This article also discusses some challenges regarding the collaborative nature of HTR platforms such as Transkribus and the way researchers can share their data generated in the process of creating or training handwritten text recognition models.

翻译：笔迹识别技术的到来为遗产研究的研究提供了新的可能性。然而,现在有必要反思研究小组的经验和做法。我们自2018年以来使用Transkribus平台,使我们寻找了最重要的方法来改进我们手写文字识别模型的性能,这些模型是17世纪开始的法国笔迹笔迹的改写。因此,本篇文章报告了创建笔迹协议的影响,使用语言模式的全面规模,以及确定使用基准模型的最佳方式,以帮助提高HTR模型的性能。将所有这些要素结合起来,确实可以将单一模型的性能提高20%以上(达到低于5%的特征错误率 ) 。本文章还讨论了关于Transkribus等手写文字识别模型的协作性质以及研究人员分享其在创建或培训手写文字识别模型过程中产生的数据的方式的一些挑战。</s>

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/