Understanding documents from their visual snapshots is an emerging problem that requires both advanced computer vision and NLP methods. The recent advance in OCR enables the accurate recognition of text blocks, yet it is still challenging to extract key information from documents due to the diversity of their layouts. Although recent studies on pre-trained language models show the importance of incorporating layout information on this task, the conjugation of texts and their layouts still follows the style of BERT optimized for understanding the 1D text. This implies there is room for further improvement considering the 2D nature of text layouts. This paper introduces a pre-trained language model, BERT Relying On Spatiality (BROS), which effectively utilizes the information included in individual text blocks and their layouts. Specifically, BROS encodes spatial information by utilizing relative positions and learns spatial dependencies between OCR blocks with a novel area-masking strategy. These two novel approaches lead to an efficient encoding of spatial layout information highlighted by the robust performance of BROS under low-resource environments. We also introduce a general-purpose parser that can be combined with BROS to extract key information even when there is no order information between text blocks. BROS shows its superiority on four public benchmarks---FUNSD, SROIE*, CORD, and SciTSR---and its robustness in practical cases where order information of text blocks is not available. Further experiments with a varying number of training examples demonstrate the high training efficiency of our approach. Our code will be open to the public.
翻译:从视觉快照中了解文件是一个新出现的问题,需要先进的计算机视觉和NLP方法来理解文件。 OCR最近的进展使得能够准确识别文本块,然而,由于文本块的布局多种多样,从文件中提取关键信息仍具有挑战性。 尽管最近对预先培训的语言模型的研究显示,在这项工作中纳入布局信息十分重要,但文本及其布局的融合仍然遵循BERT为理解1D文本而优化的风格。这意味着考虑到文本布局的2D性质,还有进一步改进的余地。本文介绍了一个预先培训的语言模型,即BERT Relip on Spaceity(BROS),该模型有效地利用单个文本块及其布局中包含的信息。具体地说,BROS通过相对位置编码空间信息,了解OCR各块之间的空间依赖性与新的区域版面战略版面战略。这两种新做法导致空间布局信息的有效调和空间布局信息相匹配,因为BROSDS-SD方法的强性性性性功能性。我们还引入了一个通用的平面拼图,可以与BROSBSI培训中的关键信息样本,在SDRSBRSBR标准上显示其高度的文本顺序。