Documents are a core part of many businesses in many fields such as law, finance, and technology among others. Automatic understanding of documents such as invoices, contracts, and resumes is lucrative, opening up many new avenues of business. The fields of natural language processing and computer vision have seen tremendous progress through the development of deep learning such that these methods have started to become infused in contemporary document understanding systems. In this survey paper, we review different techniques for document understanding for documents written in English and consolidate methodologies present in literature to act as a jumping-off point for researchers exploring this area.
翻译:文件是法律、金融和技术等许多领域许多企业的核心部分。自动理解发票、合同和履历等文件是有利可图的,开辟了许多新的商业途径。自然语言处理和计算机愿景领域通过深入学习取得了巨大进展,这些方法已开始被当代文件理解系统所采用。在本调查文件中,我们审查了以英文编写文件的文件理解不同技术,并合并了文献中存在的方法,作为探索该领域的研究人员的跳跃点。