Accurate layout analysis without subsequent text-line segmentation remains an ongoing challenge, especially when facing the Kangyur, a kind of historical Tibetan document featuring considerable touching components and mottled background. Aiming at identifying different regions in document images, layout analysis is indispensable for subsequent procedures such as character recognition. However, there was only a little research being carried out to perform line-level layout analysis which failed to deal with the Kangyur. To obtain the optimal results, a fine-grained sub-line level layout analysis approach is presented. Firstly, we introduced an accelerated method to build the dataset which is dynamic and reliable. Secondly, enhancement had been made to the SOLOv2 according to the characteristics of the Kangyur. Then, we fed the enhanced SOLOv2 with the prepared annotation file during the training phase. Once the network is trained, instances of the text line, sentence, and titles can be segmented and identified during the inference stage. The experimental results show that the proposed method delivers a decent 72.7% average precision on our dataset. In general, this preliminary research provides insights into the fine-grained sub-line level layout analysis and testifies the SOLOv2-based approaches. We also believe that the proposed methods can be adopted on other language documents with various layouts.
翻译:准确的布局分析,而没有随后的文字线分割,仍然是一项持续的挑战,特别是在面临藏藏历史文件Kangyur时,这是一种藏藏历史文件,具有相当的触摸成分和不可靠的背景。为了在文件图像中辨别不同区域,布局分析对于随后的程序,例如字符识别等程序是必不可少的。然而,目前只进行了一些小研究,以便进行线级布局分析,而没有处理Kangyur的布局分析。为了取得最佳结果,提出了细微细微的分行水平布局分析方法。首先,我们采用了一种加速的方法,以构建动态和可靠的数据集。第二,根据Kangyur的特征对SOLov2进行了改进。然后,我们在培训阶段用已编写的说明文件向增强的SOLov2提供了强化的SOLov2。一旦网络培训,文字线、句和标题的一些实例可以在推断阶段进行分解和识别。实验结果显示,拟议的方法使我们的数据集平均精确度达到72.7%。总体而言,这一初步研究为根据Kangur的精准版图提供了对SOL2号的洞察,我们还相信以SOLO为各种水平的文件。