Localizing page elements/objects such as tables, figures, equations, etc. is the primary step in extracting information from document images. We propose a novel end-to-end trainable deep network, (CDeC-Net) for detecting tables present in the documents. The proposed network consists of a multistage extension of Mask R-CNN with a dual backbone having deformable convolution for detecting tables varying in scale with high detection accuracy at higher IoU threshold. We empirically evaluate CDeC-Net on all the publicly available benchmark datasets - ICDAR-2013, ICDAR-2017, ICDAR-2019,UNLV, Marmot, PubLayNet, and TableBank - with extensive experiments. Our solution has three important properties: (i) a single trained model CDeC-Net{\ddag} performs well across all the popular benchmark datasets; (ii) we report excellent performances across multiple, including higher, thresholds of IoU; (iii) by following the same protocol of the recent papers for each of the benchmarks, we consistently demonstrate the superior quantitative performance. Our code and models will be publicly released for enabling the reproducibility of the results.
翻译:定位页面元素/对象,如表格、数字、方程等,是从文件图像中提取信息的主要步骤。我们提议建立一个新的端到端可培训的深网络(CDec-Net),用于检测文件中的表格。拟议网络包括蒙面 R-CNN的多阶段扩展,其双干骨在检测不同比例且检测精确度高IoU临界值的表格方面出现变形变异。我们实证地评估了所有公开的基准数据集----ICDAR-2013、ICDAR-2017、ICDAR-2019、UNLV、Marmot、PubLayNet和TableBank - 并进行了广泛的实验。我们的解决办法有三个重要属性:(一) 单一的经过培训的模型CDC-Net_dag}在所有流行的基准数据集中表现良好;(二) 我们报告在多个基准数据集(包括更高的IoU阈值)中的出色业绩;(三) 遵循最近关于每项基准的文件的同一协议,我们一贯地展示优秀的量化成绩。