LGPMA: 与本地和全球金字塔面具比对的复杂表结构识别 (LGPMA: Complicated Table Structure Recognition with Local and Global Pyramid Mask Alignment)

from arxiv, Award of ICDAR2021 Best Industry Paper. Code is available at https://davar-lab.github.io/publication.html or https://github.com/hikopensource/DAVAR-Lab-OCR -------------- Fixed formula typos in Eq. 1

Table structure recognition is a challenging task due to the various structures and complicated cell spanning relations. Previous methods handled the problem starting from elements in different granularities (rows/columns, text regions), which somehow fell into the issues like lossy heuristic rules or neglect of empty cell division. Based on table structure characteristics, we find that obtaining the aligned bounding boxes of text region can effectively maintain the entire relevant range of different cells. However, the aligned bounding boxes are hard to be accurately predicted due to the visual ambiguities. In this paper, we aim to obtain more reliable aligned bounding boxes by fully utilizing the visual information from both text regions in proposed local features and cell relations in global features. Specifically, we propose the framework of Local and Global Pyramid Mask Alignment, which adopts the soft pyramid mask learning mechanism in both the local and global feature maps. It allows the predicted boundaries of bounding boxes to break through the limitation of original proposals. A pyramid mask re-scoring module is then integrated to compromise the local and global information and refine the predicted boundaries. Finally, we propose a robust table structure recovery pipeline to obtain the final structure, in which we also effectively solve the problems of empty cells locating and division. Experimental results show that the proposed method achieves competitive and even new state-of-the-art performance on several public benchmarks.

翻译：表格结构识别是一项艰巨的任务, 原因是各种结构以及复杂的细胞关系。先前的方法处理的问题来自不同的颗粒元素( 块/ 块、文本区域), 不知何故, 这些问题会落到损失的疲劳规则或对空单元格分割的忽视中。基于表格结构特征, 我们发现, 获得文本区域一致的捆绑框可以有效地维持所有不同的相关单元格范围。然而, 由于视觉模糊性, 很难准确预测对齐的捆绑框。在本文件中, 我们的目标是通过充分利用两个文本区域的视觉信息, 在全球特征中, 充分利用拟议的本地特征和单元格关系的视觉信息, 从而获得更可靠的捆绑框。具体地说, 我们提出了本地和全球的金字塔式遮掩码协调框架, 在本地和全球特征地图中采用软金字塔遮掩码学习机制。这使得连接框的预测边界能够打破原始提案的局限性。之后, 金字形遮罩重新拼码模块会被整合, 以损害当地和全球的信息, 并完善预测的界限。最后, 我们提议一个坚固的表格结构恢复管道结构, 以获得最终结构, 来找到公共标准,, 并展示各种的实验法。