Table structure recognition (TSR) aims at extracting tables in images into machine-understandable formats. Recent methods solve this problem by predicting the adjacency relations of detected cell boxes, or learning to generate the corresponding markup sequences from the table images. However, they either count on additional heuristic rules to recover the table structures, or require a huge amount of training data and time-consuming sequential decoders. In this paper, we propose an alternative paradigm. We model TSR as a logical location regression problem and propose a new TSR framework called LORE, standing for LOgical location REgression network, which for the first time combines logical location regression together with spatial location regression of table cells. Our proposed LORE is conceptually simpler, easier to train and more accurate than previous TSR models of other paradigms. Experiments on standard benchmarks demonstrate that LORE consistently outperforms prior arts. Code is available at https:// github.com/AlibabaResearch/AdvancedLiterateMachinery/tree/main/DocumentUnderstanding/LORE-TSR.
翻译:表格结构识别 (TSR) 旨在将图像中的表格转换成机器可理解的格式。 最新的方法通过预测检测到的单元格框的相邻关系或学习从表格图像中生成相应的标记序列来解决这个问题。 但是,它们要么依靠额外的超速规则来恢复表格结构,要么需要大量的培训数据和耗费时间的相继解码器。 在本文中,我们提出了一个替代范例。 我们将TSR作为逻辑位置回归问题模型,并提出一个新的TRER框架,称为LORE, 站立于Logical位置回归网络, 首次将逻辑位置回归与表格单元格的空间位置回归结合起来。 我们提议的LORE在概念上比较简单, 较以往其他模式的TSR模型更易于培训和更准确。 标准基准实验表明,LORE一贯地超越了以前的艺术。 代码可在 https:// github.com/ Alibabaresearch/AdvancedLiteraterMachinery/tree/main/main/Document Oroverde/ORE-TSR.</s>