Information Extraction (IE) for semi-structured document images is often approached as a sequence tagging problem by classifying each recognized input token into one of the IOB (Inside, Outside, and Beginning) categories. However, such problem setup has two inherent limitations that (1) it cannot easily handle complex spatial relationships and (2) it is not suitable for highly structured information, which are nevertheless frequently observed in real-world document images. To tackle these issues, we first formulate the IE task as spatial dependency parsing problem that focuses on the relationship among text tokens in the documents. Under this setup, we then propose SPADE (SPAtial DEpendency parser) that models highly complex spatial relationships and an arbitrary number of information layers in the documents in an end-to-end manner. We evaluate it on various kinds of documents such as receipts, name cards, forms, and invoices, and show that it achieves a similar or better performance compared to strong baselines including BERT-based IOB taggger.
翻译:半结构化文件图像的信息提取(IE)往往被当作一个序列标记问题处理,将每个公认的输入符号分类为IOB(内部、外部和起始)类别之一,然而,这类问题设置有两个内在的局限性:(1) 它不易处理复杂的空间关系,(2) 它不适合高度结构化的信息,但在现实世界的文件图像中却经常看到这些信息。为了解决这些问题,我们首先将IE任务作为空间依赖性分析问题,侧重于文件中文本标识之间的关系。在此设置下,我们然后建议SPADE(SPADE)以端到端的方式模拟文件中高度复杂的空间关系和任意数量的信息层。我们评估它对于各种文件,例如收据、名卡、表格和发票等,并表明它与强大的基线(包括基于ERT的IOB标记器)相比具有相似或更好的性能。