BROS: 理解文件的布局-软件预培训语言模式 (BROS: A Layout-Aware Pre-trained Language Model for Understanding Documents)

Understanding documents from their visual snapshots is an emerging problem that requires both advanced computer vision and NLP methods. The recent advance in OCR enables the accurate recognition of text blocks, yet it is still challenging to extract key information from documents due to the diversity of their layouts. Although recent studies on pre-trained language models show the importance of incorporating layout information on this task, the conjugation of texts and their layouts still follows the style of BERT optimized for understanding the 1D text. This implies there is room for further improvement considering the 2D nature of text layouts. This paper introduces a pre-trained language model, BERT Relying On Spatiality (BROS), which effectively utilizes the information included in individual text blocks and their layouts. Specifically, BROS encodes spatial information by utilizing relative positions and learns spatial dependencies between OCR blocks with a novel area-masking strategy. These two novel approaches lead to an efficient encoding of spatial layout information highlighted by the robust performance of BROS under low-resource environments. We also introduce a general-purpose parser that can be combined with BROS to extract key information even when there is no order information between text blocks. BROS shows its superiority on four public benchmarks -- FUNSD, SROIE*, CORD, and SciTSR -- and its robustness in practical cases where order information of text blocks is not available. Further experiments with a varying number of training examples demonstrate the high training efficiency of our approach. Our code will be open to the public.

翻译：从视觉快照中了解文件是一个新出现的问题,需要先进的计算机视野和NLP方法来理解文件。 OCR最近的进展使得能够准确识别文本块,然而,由于文本块的布局多种多样,从文件中提取关键信息仍具有挑战性。尽管最近对预先培训的语言模型的研究显示,在这项工作中纳入布局信息十分重要,但文本及其布局的融合仍然遵循BERT最优化的风格,以了解1D文本。这意味着考虑到文本布局的2D性质,仍有进一步改进的余地。本文介绍了预先培训的语言模型,BERT Replish on Spaceity(BRO),该模型有效地利用单个文本块及其布局中包含的信息。具体地说,BROS通过相对位置编码空间信息信息,并学习OCRCR各块之间空间依赖性的新区域版战略。这两种新做法导致空间布局信息的有效调和空间布局信息相匹配,因为BROSBSI在低资源环境中的稳健性做法。我们还引入了一个通用的平面拼图,可以与BROSci Rebly Reduding on evely regnial real deal deal development a ex ex ex bese regregal deal be the Slaview Stal destral destral destral degal degild.