Tables are information-rich structured objects in document images. While significant work has been done in localizing tables as graphic objects in document images, only limited attempts exist on table structure recognition. Most existing literature on structure recognition depends on extraction of meta-features from the PDF document or on the optical character recognition (OCR) models to extract low-level layout features from the image. However, these methods fail to generalize well because of the absence of meta-features or errors made by the OCR when there is a significant variance in table layouts and text organization. In our work, we focus on tables that have complex structures, dense content, and varying layouts with no dependency on meta-features and/or OCR. We present an approach for table structure recognition that combines cell detection and interaction modules to localize the cells and predict their row and column associations with other detected cells. We incorporate structural constraints as additional differential components to the loss function for cell detection. We empirically validate our method on the publicly available real-world datasets - ICDAR-2013, ICDAR-2019 (cTDaR) archival, UNLV, SciTSR, SciTSR-COMP, TableBank, and PubTabNet. Our attempt opens up a new direction for table structure recognition by combining top-down (table cells detection) and bottom-up (structure recognition) cues in visually understanding the tables.
翻译:虽然在作为文件图像中的图形对象的地方化表格方面做了大量工作,但在表格结构识别方面仅进行了有限的尝试。关于结构识别的现有文献大多取决于从 PDF 文档中提取元特征或光学字符识别模型,以便从图像中提取低层次布局特征。然而,这些方法未能加以概括,因为没有元特征或错误,因为OCR在表格布局和文本组织存在显著差异时没有采用元特征或错误。在我们的工作中,我们侧重于结构结构复杂、内容密集和布局不依赖于元特性和/或OCR的表格。我们提出了一个表格结构识别方法,将细胞检测和互动模块结合起来,以便从图像中提取低层次的布局特征特征特征特征。我们把结构性限制作为损失检测功能的额外差异组成部分纳入。我们从经验上验证了在公开存在的真实世界数据集 - ICDAR- 2013、 ICDAR-2019 (TDAR-20R) 上具有复杂结构、不依赖于元特点和/或 ORCR 的布局结构。我们提出了一种方法,将单元格检测模块和图式图式图式图式结构的图像识别系统,通过我们的上图式图式识别和图式图式图式图式图式识别。