Automating the Key Information Extraction (KIE) from documents improves efficiency, productivity, and security in many industrial scenarios such as rapid indexing and archiving. Many existing supervised learning methods for the KIE task need to feed a large number of labeled samples and learn separate models for different types of documents. However, collecting and labeling a large dataset is time-consuming and is not a user-friendly requirement for many cloud platforms. To overcome these challenges, we propose a deep end-to-end trainable network for one-shot KIE using partial graph matching. Contrary to previous methods that the learning of similarity and solving are optimized separately, our method enables the learning of the two processes in an end-to-end framework. Existing one-shot KIE methods are either template or simple attention-based learning approach that struggle to handle texts that are shifted beyond their desired positions caused by printers, as illustrated in Fig.1. To solve this problem, we add one-to-(at most)-one constraint such that we will find the globally optimized solution even if some texts are drifted. Further, we design a multimodal context ensemble block to boost the performance through fusing features of spatial, textual, and aspect representations. To promote research of KIE, we collected and annotated a one-shot document KIE dataset named DKIE with diverse types of images. The DKIE dataset consists of 2.5K document images captured by mobile phones in natural scenes, and it is the largest available one-shot KIE dataset up to now. The results of experiments on DKIE show that our method achieved state-of-the-art performance compared with recent one-shot and supervised learning approaches. The dataset and proposed one-shot KIE model will be released soo
翻译:从文档中自动生成关键信息提取( KIE ), 提高了效率、 生产率和安全性。 许多现有的 KIE 任务监管下的学习方法需要为大量标签样本提供食物, 并学习不同类型文档的不同模型。 然而, 收集和标签大型数据集需要时间, 并不是许多云层平台的方便用户的要求。 为了克服这些挑战, 我们建议使用部分图形匹配来为一发 KIE 提供一个深端到终端的可培训网络。 与以前的方法相反, 类似性和解决方案的学习是分别优化的。 许多现有的 KIE 任务受监督的学习方法需要为大量标签样本提供食物, 并且为不同类型文档的匹配方法。 我们设计了一个多式背景的 Kenemble 图像在端到端框架中学习两个进程 。 现有的一发式 KIE 方法, 将用来通过打印机移动到您想要的位置, 将您所收集的文本转换成一个图像, KKIE 和 KKKIE 的图像的图像格式, 将显示一个图像的运行状况, 将显示我们所收集的图像的自然数据格式。