Specialized transformer-based models for encoding tabular data have gained interest in academia. Although tabular data is omnipresent in industry, applications of table transformers are still missing. In this paper, we study how these models can be applied to an industrial Named Entity Recognition (NER) problem where the entities are mentioned in tabular-structured spreadsheets. The highly technical nature of spreadsheets as well as the lack of labeled data present major challenges for fine-tuning transformer-based models. Therefore, we develop a dedicated table data augmentation strategy based on available domain-specific knowledge graphs. We show that this boosts performance in our low-resource scenario considerably. Further, we investigate the benefits of tabular structure as inductive bias compared to tables as linearized sequences. Our experiments confirm that a table transformer outperforms other baselines and that its tabular inductive bias is vital for convergence of transformer-based models.
翻译:以专门变压器为基础的编码表格数据的特殊变压器模型引起了学术界的兴趣。尽管表格数据在行业中是无处不在的,但表格变压器的应用仍然缺失。在本文中,我们研究这些模型如何适用于在表格结构电子表格中提及实体的工业命名实体识别(NER)问题。电子表格高度技术性强以及缺乏标签数据对微调变压器模型提出了重大挑战。因此,我们根据现有特定域知识图表制定了专门的表格数据增强战略。我们表明,这大大促进了我们低资源情景中的性能。此外,我们研究表结构的优点,将表结构与表的直线序列相比,作为暗示偏差。我们的实验证实,表格变压器优于其他基线,其表性偏差对于变压器模型的趋同至关重要。