The global Information and Communications Technology (ICT) supply chain is a complex network consisting of all types of participants. It is often formulated as a Social Network to discuss the supply chain network's relations, properties, and development in supply chain management. Information sharing plays a crucial role in improving the efficiency of the supply chain, and datasheets are the most common data format to describe e-component commodities in the ICT supply chain because of human readability. However, with the surging number of electronic documents, it has been far beyond the capacity of human readers, and it is also challenging to process tabular data automatically because of the complex table structures and heterogeneous layouts. Table Structure Recognition (TSR) aims to represent tables with complex structures in a machine-interpretable format so that the tabular data can be processed automatically. In this paper, we formulate TSR as an object detection problem and propose to generate an intuitive representation of a complex table structure to enable structuring of the tabular data related to the commodities. To cope with border-less and small layouts, we propose a cost-sensitive loss function by considering the detection difficulty of each class. Besides, we propose a novel anchor generation method using the character of tables that columns in a table should share an identical height, and rows in a table should share the same width. We implement our proposed method based on Faster-RCNN and achieve 94.79% on mean Average Precision (AP), and consistently improve more than 1.5% AP for different benchmark models.
翻译:全球信息和通信技术(信通技术)供应链是一个复杂的网络,由各类参与者组成,是一个复杂的网络,通常是一个社会网络,讨论供应链网络的关系、性质和供应链管理的发展,信息共享在提高供应链的效率方面发挥着关键作用,而数据单是描述信通技术供应链中电子部件商品的最通用数据格式,因为人能读取。然而,随着电子文件数量激增,它远远超出了读者的能力,由于表格结构复杂和布局复杂,自动处理表格数据也具有挑战性。表格结构识别(TSR)的目的是以机器解释格式代表结构复杂、结构复杂的表格,以便自动处理表格数据。在本文件中,我们将TSR作为一个目标检测问题,并提议生成一个复杂的表格结构,以便能够构建与商品有关的表格数据结构。为了应对无边界和小型的布局,我们提出一个成本敏感的损失函数,因为考虑到每类的检测困难。表结构确认(TSR)旨在以机器解释格式代表结构复杂的表格显示复杂结构的表格,以便自动处理表格数据。此外,我们提议采用相同的标准格式,即采用相同的标准格式,在标准列表中,应当采用相同的标准比标准格式,在标准格式表格中,在标准格式中,在标准格式中,在标准格式中,应当采用相同的标准格式中采用相同的标准。