Image-based table recognition is a challenging task due to the diversity of table styles and the complexity of table structures. Most of the previous methods focus on a non-end-to-end approach which divides the problem into two separate sub-problems: table structure recognition; and cell-content recognition and then attempts to solve each sub-problem independently using two separate systems. In this paper, we propose an end-to-end multi-task learning model for image-based table recognition. The proposed model consists of one shared encoder, one shared decoder, and three separate decoders which are used for learning three sub-tasks of table recognition: table structure recognition, cell detection, and cell-content recognition. The whole system can be easily trained and inferred in an end-to-end approach. In the experiments, we evaluate the performance of the proposed model on two large-scale datasets: FinTabNet and PubTabNet. The experiment results show that the proposed model outperforms the state-of-the-art methods in all benchmark datasets.
翻译:图像表格识别是一个具有挑战性的任务,由于表格样式的多样性和表格结构的复杂性。大多数以前的方法都采用非端到端方法,将问题分成两个独立的子问题:表格结构识别和单元格内容识别,然后尝试使用两个独立的系统分别解决每个子问题。本文提出了一个图像表格识别的端到端多任务学习模型。所提出的模型包括一个共享的编码器、一个共享的解码器和三个独立的解码器,用于学习表格识别的三个子任务:表格结构识别、单元格检测和单元格内容识别。整个系统可以轻松地进行端到端的训练和推断。在实验中,我们评估了所提出的模型在两个大规模数据集FinTabNet和PubTabNet上的性能。实验结果显示,所提出的模型在所有基准数据集中均优于现有方法。