Computer aided Tabular Data Extraction has always been a very challenging and error prone task because it demands both Spectral and Spatial Sanity of data. In this paper we discuss an approach for Tabular Data Extraction in the realm of document comprehension. Given the different kinds of the Tabular formats that are often found across various documents, we discuss a novel approach using Computer Vision for extraction of tabular data from images or vector pdf(s) converted to image(s).
翻译:计算机辅助的 Tabular 数据提取程序一直是一项极具挑战性和易出错的任务,因为它既要求数据的光谱性,又要求数据的空间性。 在本文中,我们讨论了在文件理解领域采用Tbaulal 数据提取方法的问题。 鉴于各种文件通常使用不同的表格格式,我们讨论了一种新颖的方法,即利用计算机愿景从图像或矢量 pdf 中提取表格数据,从转换成图像的图像或矢量 pdf 中提取表格数据。