Recognizing the layout of unstructured digital documents is crucial when parsing the documents into the structured, machine-readable format for downstream applications. Recent studies in Document Layout Analysis usually rely on computer vision models to understand documents while ignoring other information, such as context information or relation of document components, which are vital to capture. Our Doc-GCN presents an effective way to harmonize and integrate heterogeneous aspects for Document Layout Analysis. We first construct graphs to explicitly describe four main aspects, including syntactic, semantic, density, and appearance/visual information. Then, we apply graph convolutional networks for representing each aspect of information and use pooling to integrate them. Finally, we aggregate each aspect and feed them into 2-layer MLPs for document layout component classification. Our Doc-GCN achieves new state-of-the-art results in three widely used DLA datasets.
翻译:在将文件分为结构化的、机器可读的下游应用格式时,认识到非结构化数字文件的布局至关重要。文件布局分析中最近的研究通常依靠计算机视觉模型来理解文件,而忽略其他信息,例如背景信息或文件组成部分的关系,这些对于捕捉至关重要。我们的Doc-GCN为文件布局分析提供了协调和整合各种内容的有效方法。我们首先为明确描述四个主要方面,包括合成、语义、密度和外观/视觉信息而绘制图表。然后,我们应用图形革命网络来代表信息的每个方面,并利用汇集来整合它们。最后,我们将每个方面集中起来,将其输入到两层 MLP,用于文件布局组成部分的分类。我们的Doc-GCN在三种广泛使用的DLA数据集中取得了新的最新结果。