Key information extraction from document images is of paramount importance in office automation. Conventional template matching based approaches fail to generalize well to document images of unseen templates, and are not robust against text recognition errors. In this paper, we propose an end-to-end Spatial Dual-Modality Graph Reasoning method (SDMG-R) to extract key information from unstructured document images. We model document images as dual-modality graphs, nodes of which encode both the visual and textual features of detected text regions, and edges of which represent the spatial relations between neighboring text regions. The key information extraction is solved by iteratively propagating messages along graph edges and reasoning the categories of graph nodes. In order to roundly evaluate our proposed method as well as boost the future research, we release a new dataset named WildReceipt, which is collected and annotated tailored for the evaluation of key information extraction from document images of unseen templates in the wild. It contains 25 key information categories, a total of about 69000 text boxes, and is about 2 times larger than the existing public datasets. Extensive experiments validate that all information including visual features, textual features and spatial relations can benefit key information extraction. It has been shown that SDMG-R can effectively extract key information from document images of unseen templates, and obtain new state-of-the-art results on the recent popular benchmark SROIE and our WildReceipt. Our code and dataset will be publicly released.
翻译:从文档图像中提取关键信息在办公室自动化中具有至关重要的意义。 常规模板比对基于文件图像的方法不能很好地概括到文档的隐蔽模板图像,而且没有针对文本识别错误的强力。 在本文中,我们提议了从无结构的文档图像中提取关键信息。 我们用双模式图形来模拟文件图像,其中的节点将检测到的文本区域的视觉和文本特征编码,其边缘代表着相邻文本区域之间的空间关系。关键信息提取是通过在图形边缘反复传播信息并推理图形节点的类别来解决的。 为了对拟议方法进行从今后的研究进行从终端到终端的空间双式双式双式图解解析(SDMG-R-R-R)分析,我们发布了一个新的数据集,包括图像的视觉特征,以及我们最新的图像流出的关键模板,可以有效地获取我们最新的图像流出。