The arrival of handwriting recognition technologies offers new possibilities to research in heritage studies. However, it is now necessary to reflect on the experiences and the practices developed by research teams. Our use of the Transkribus platform since 2018 has led us to search for the most significant ways to improve the performance of our handwritten recognition models (HTR) which are made to transcribe French handwriting dating from the 17th century. This article therefore reports on the impacts of creating transcribing protocols, using the lexical elements at full scale and determining the best way to use base model in order to help to increase the performance of HTR models. Combining all of these elements can indeed increase the performance of a single model by more than 20% (reaching a Character Error Rate below 5%). It also discusses some challenges regarding the collaborative nature of HTR platforms such as Transkribus and the way researchers can share their data generated in the process of creating or training handwritten text recognition models.
翻译:笔迹识别技术的到来为遗产研究的研究提供了新的可能性。然而,现在有必要反思研究小组的经验和做法。我们自2018年以来使用Transkribus平台,使我们寻找了最重要的方法来改进我们手写识别模型的性能,手写识别模型(HTR)被用来改写17世纪以来的法国笔迹。因此,本篇文章报告了创建抄写协议的影响,全面使用词汇要素,并确定使用基准模型的最佳方法,以帮助提高HTR模型的性能。将所有这些要素结合起来,确实可以将单一模型的性能提高20%以上(达到5%以下的特征错误率 ), 它还讨论了关于Transkribus等手写识别模型的协作性质以及研究人员分享其在创建或培训手写文本识别模型过程中产生的数据的方式。