We introduce in this paper a new dataset of annotated pages from books of hours, a type of handwritten prayer books owned and used by rich lay people in the late middle ages. The dataset was created for conducting historical research on the evolution of the religious mindset in Europe at this period since the book of hours represent one of the major sources of information thanks both to their rich illustrations and the different types of religious sources they contain. We first describe how the corpus was collected and manually annotated then present the evaluation of a state-of-the-art system for text line detection and for zone detection and typing. The corpus is freely available for research.
翻译:在本文中,我们从时数书籍中引入了一套新的附加说明的网页数据集,这是一种手写祈祷书,由中晚年的富人拥有和使用,该数据集是用来对欧洲宗教思想演变进行历史研究的,因为时数书籍是主要信息来源之一,因为它们的插图丰富,而且其中含有不同种类的宗教渊源。我们首先描述如何收集该文集,手动加注,然后介绍对最先进的文本线探测系统以及区探测和打字系统的评价。该文集可以免费用于研究。