Arctic-Extract is a state-of-the-art model designed for extracting structural data (question answering, entities and tables) from scanned or digital-born business documents. Despite its SoTA capabilities, the model is deployable on resource-constrained hardware, weighting only 6.6 GiB, making it suitable for deployment on devices with limited resources, such as A10 GPUs with 24 GB of memory. Arctic-Extract can process up to 125 A4 pages on those GPUs, making suitable for long document processing. This paper highlights Arctic-Extract's training protocols and evaluation results, demonstrating its strong performance in document understanding.
翻译:Arctic-Extract 是一种先进模型,专为从扫描或数字原生的商业文档中提取结构化数据(如问答、实体和表格)而设计。尽管具备最先进的性能,该模型可部署于资源受限的硬件上,仅占用 6.6 GiB 存储空间,适合在有限资源的设备上运行,例如配备 24 GB 内存的 A10 GPU。在这些 GPU 上,Arctic-Extract 最多可处理 125 页 A4 文档,适用于长文档处理。本文重点介绍了 Arctic-Extract 的训练协议和评估结果,展示了其在文档理解方面的卓越性能。