Understanding documents with rich layouts is an essential step towards information extraction. Business intelligence processes often require the extraction of useful semantic content from documents at a large scale for subsequent decision-making tasks. In this context, instance-level segmentation of different document objects(title, sections, figures, tables and so on) has emerged as an interesting problem for the document layout analysis community. To advance the research in this direction, we present a transformer-based model for end-to-end segmentation of complex layouts in document images. To our knowledge, this is the first work on transformer-based document segmentation. Extensive experimentation on the PubLayNet dataset shows that our model achieved comparable or better segmentation performance than the existing state-of-the-art approaches. We hope our simple and flexible framework could serve as a promising baseline for instance-level recognition tasks in document images.
翻译:了解内容丰富的文件是信息提取的关键步骤。商业情报程序通常要求从大规模文件中提取有用的语义内容,用于随后的决策任务。在这方面,不同文件对象(标题、章节、图表、表格等)的例级分解已成为文件布局分析界一个有趣的问题。为了推进这方面的研究,我们提出了一个基于变压器的模型,用于文件图像中复杂布局的端到端分解。据我们所知,这是关于基于变压器的文件分解的首项工作。PubLayNet数据集的广泛实验表明,我们的模型比现有的最新方法实现了可比或更好的分解功能。我们希望我们简单而灵活的框架能够作为文件图像中实例级识别任务的有希望的基准。