Natural language processing for document scans and PDFs has the potential to enormously improve the efficiency of business processes. Layout-aware word embeddings such as LayoutLM have shown promise for classification of and information extraction from such documents. This paper proposes a new pre-training task called that can improve performance of layout-aware word embeddings that incorporate 2-D position embeddings. We compare models pre-trained with only language masking against models pre-trained with both language masking and position masking, and we find that position masking improves performance by over 5% on a form understanding task.
翻译:用于文件扫描和PDF的自然语言处理有可能极大地提高业务流程的效率。 诸如布局LM(LM)这样的布局认知字嵌入显示对此类文件进行分类和信息提取的希望。本文建议一项新的培训前任务,即改进包含二维位置嵌入的布局认知字嵌入的性能。我们比较了经过预先培训的模型,仅使用语言遮盖,与预先培训的带有语言遮掩和位置遮掩的模型相比,我们发现,在形式理解任务上,遮掩功能提高了5%以上。