Table answering questions from business documents has many challenges that require understanding tabular structures, cross-document referencing, and additional numeric computations beyond simple search queries. This paper introduces a novel pipeline, named TabIQA, to answer questions about business document images. TabIQA combines state-of-the-art deep learning techniques 1) to extract table content and structural information from images and 2) to answer various questions related to numerical data, text-based information, and complex queries from structured tables. The evaluation results on VQAonBD 2023 dataset demonstrate the effectiveness of TabIQA in achieving promising performance in answering table-related questions. The TabIQA repository is available at https://github.com/phucty/itabqa.
翻译:此论文介绍了一种名为TabIQA的新型流程,用于回答涉及商业文档图像的表格问题。TabIQA结合了最先进的深度学习技术,1)从图像中提取表格内容和结构信息,2)回答与数字数据、基于文本的信息和结构化表格中的复杂查询相关的各种问题。对VQAonBD 2023数据集进行的评估结果表明,TabIQA在回答与表格相关的问题方面取得了令人满意的性能。TabIQA仓库位于https://github.com/phucty/itabqa。