PageNet:走向最终到最终的弱弱监督的页面级手写中文文本识别 (PageNet: Towards End-to-End Weakly Supervised Page-Level Handwritten Chinese Text Recognition)

Handwritten Chinese text recognition (HCTR) has been an active research topic for decades. However, most previous studies solely focus on the recognition of cropped text line images, ignoring the error caused by text line detection in real-world applications. Although some approaches aimed at page-level text recognition have been proposed in recent years, they either are limited to simple layouts or require very detailed annotations including expensive line-level and even character-level bounding boxes. To this end, we propose PageNet for end-to-end weakly supervised page-level HCTR. PageNet detects and recognizes characters and predicts the reading order between them, which is more robust and flexible when dealing with complex layouts including multi-directional and curved text lines. Utilizing the proposed weakly supervised learning framework, PageNet requires only transcripts to be annotated for real data; however, it can still output detection and recognition results at both the character and line levels, avoiding the labor and cost of labeling bounding boxes of characters and text lines. Extensive experiments conducted on five datasets demonstrate the superiority of PageNet over existing weakly supervised and fully supervised page-level methods. These experimental results may spark further research beyond the realms of existing methods based on connectionist temporal classification or attention. The source code is available at https://github.com/shannanyinxiang/PageNet.

翻译：数十年来,中国手写文本识别(HCTR)一直是一项积极的研究课题,然而,以往的研究大多仅侧重于识别裁剪文本线图像,忽略了在现实世界应用中发现文本线造成的错误。尽管近年来提出了一些旨在页面层次文本识别的方法,但近年来,这些方法要么局限于简单的布局,要么需要非常详细的说明,包括昂贵的线级甚至字符级的字符绑定框。为此,我们提议PageNet用于终端到终端,监管薄弱的页面级HCTR。PageNet检测和识别字符并预测它们之间的阅读顺序,在处理复杂布局时,包括多方向和曲线的文本线时,这些功能更加强大和灵活。利用拟议的监管薄弱的文本识别框架,PageNet只需要对真实数据的注释;然而,它仍然可以在字符和线级上输出检测和识别结果,避免给字符框和文本线贴标签的劳动和成本。在五套数据集上进行的广泛实验显示PageNet的优越性,而这种特征网络比现有的监管薄弱和完全监管的页面/曲线线条更灵活。现有的Shelsmain连接法系/destrangeal res源可能是基于地域的实验源。