Image-based table recognition is a challenging task due to the diversity of table styles and the complexity of table structures. Most of the previous methods focus on a non-end-to-end approach which divides the problem into two separate sub-problems: table structure recognition; and cell-content recognition and then attempts to solve each sub-problem independently using two separate systems. In this paper, we propose an end-to-end multi-task learning model for image-based table recognition. The proposed model consists of one shared encoder, one shared decoder, and three separate decoders which are used for learning three sub-tasks of table recognition: table structure recognition, cell detection, and cell-content recognition. The whole system can be easily trained and inferred in an end-to-end approach. In the experiments, we evaluate the performance of the proposed model on two large-scale datasets: FinTabNet and PubTabNet. The experiment results show that the proposed model outperforms the state-of-the-art methods in all benchmark datasets.
翻译:由于表格样式的多样性和表格结构的复杂性,基于图像的表格识别是一项艰巨的任务。以往方法大多侧重于非端对端方法,将问题分为两个不同的子问题:表格结构识别;以及细胞内容识别,然后试图使用两个独立的系统独立解决每个子问题。在本文件中,我们提出了一个基于图像的表格识别的端到端多任务学习模型。拟议的模型包括一个共享的编码器、一个共享的解码器和三个独立的解码器,用于学习表格识别的三个子任务:表格结构识别、单元格检测和细胞内容识别。整个系统可以很容易地培训和在端对端方法中推断。在实验中,我们评估了两个大型数据集(FinTabNet和PubTabNet)的拟议模型的性能。实验结果表明,拟议的模型超越了所有基准数据集中的最新方法。</s>