The automated analysis of administrative documents is an important field in document recognition that is studied for decades. Invoices are key documents among these huge amounts of documents available in companies and public services. Invoices contain most of the time data that are presented in tables that should be clearly identified to extract suitable information. In this paper, we propose an approach that combines an image processing based estimation of the shape of the tables with a graph-based representation of the document, which is used to identify complex tables precisely. We propose an experimental evaluation using a real case application.
翻译:对行政文件进行自动化分析是几十年研究的文件识别的一个重要领域,发票是公司和公共服务部门可提供的大量文件中的关键文件,发票中的大部分时间数据载于表格中,表格中应明确列出这些数据,以获取适当信息。在本文件中,我们提出一种方法,将基于图像的对表格形状的估计与基于图表的对文件的表述结合起来,并用图表来准确确定复杂的表格。我们建议使用实际应用软件进行试验性评估。