Recently, table structure recognition has achieved impressive progress with the help of deep graph models. Most of them exploit single visual cues of tabular elements or simply combine visual cues with other modalities via early fusion to reason their graph relationships. However, neither early fusion nor individually reasoning in terms of multiple modalities can be appropriate for all varieties of table structures with great diversity. Instead, different modalities are expected to collaborate with each other in different patterns for different table cases. In the community, the importance of intra-inter modality interactions for table structure reasoning is still unexplored. In this paper, we define it as heterogeneous table structure recognition (Hetero-TSR) problem. With the aim of filling this gap, we present a novel Neural Collaborative Graph Machines (NCGM) equipped with stacked collaborative blocks, which alternatively extracts intra-modality context and models inter-modality interactions in a hierarchical way. It can represent the intra-inter modality relationships of tabular elements more robustly, which significantly improves the recognition performance. We also show that the proposed NCGM can modulate collaborative pattern of different modalities conditioned on the context of intra-modality cues, which is vital for diversified table cases. Experimental results on benchmarks demonstrate our proposed NCGM achieves state-of-the-art performance and beats other contemporary methods by a large margin especially under challenging scenarios.
翻译:最近,在深图模型的帮助下,表结构承认取得了令人印象深刻的进展,其中多数利用了表格元素的单一直观提示,或者简单地通过早期融合将视觉提示与其他模式结合起来,以解释其图形关系。然而,早期融合或个别推理的多种模式都不适用于具有巨大多样性的表格结构的所有种类。相反,不同模式预计将在不同表格情况下以不同模式相互协作。在社区,表格结构推理的表格结构内部互动的重要性仍未得到探讨。在本文中,我们将其定义为不同表格结构的识别(Hetero-TSR)问题。为了填补这一空白,我们提出了一个新的神经协作图案(NCGM),配有堆叠式合作块,或者以等级方式提取内部模式背景和模式之间模式互动。它可以代表表格要素之间不同模式的关系,从而大大改进了表结构的认知性。我们提出的国家监测机制可以调整不同模式的协作模式模式的识别模式模式模式模式,以内部结构结构识别(Hetrotoro-TSR) 问题。为了填补这一空白,我们提出了一个新的神经协作结构结构(NCM) 模型中的其他模型展示了我们提出的大规模实验性模型的模型。