表结构识别方法的可重复性和可复制性研究 (A Study on Reproducibility and Replicability of Table Structure Recognition Methods) - 专知论文

会员服务 ·

0

结构识别 · 数据集 · 识别方法 · 识别 · 论文 ·

2023 年 4 月 20 日

A Study on Reproducibility and Replicability of Table Structure Recognition Methods

翻译：表结构识别方法的可重复性和可复制性研究

Kehinde Ajayi,Muntabhir Hasan Choudhury,Sarah Rajtmajer,Jian Wu

from arxiv, 10 pages, 5 figures

Concerns about reproducibility in artificial intelligence (AI) have emerged, as researchers have reported unsuccessful attempts to directly reproduce published findings in the field. Replicability, the ability to affirm a finding using the same procedures on new data, has not been well studied. In this paper, we examine both reproducibility and replicability of a corpus of 16 papers on table structure recognition (TSR), an AI task aimed at identifying cell locations of tables in digital documents. We attempt to reproduce published results using codes and datasets provided by the original authors. We then examine replicability using a dataset similar to the original as well as a new dataset, GenTSR, consisting of 386 annotated tables extracted from scientific papers. Out of 16 papers studied, we reproduce results consistent with the original in only four. Two of the four papers are identified as replicable using the similar dataset under certain IoU values. No paper is identified as replicable using the new dataset. We offer observations on the causes of irreproducibility and irreplicability. All code and data are available on Codeocean at https://codeocean.com/capsule/6680116/tree.

翻译：在人工智能（AI）领域，出现了关于可重复性的担忧，因为研究人员已经报道了未能直接重现该领域发表结果的情况。可复制性在这方面研究不足。在本文中，我们使用16篇关于表结构识别（TSR）的论文语料库，对可重复性和可复制性进行了研究，TSR是一种旨在识别数字化文档中表格单元格位置的AI任务。我们尝试使用原作者提供的代码和数据集来重现发表结果。然后，我们使用与原始数据集相似的数据集以及新数据集GenTSR来检查可复制性，后者由386个从科学论文中提取的带注释表格组成。在研究的16个论文中，只有4个论文的结果与原始结果一致。在IoU值一定的情况下，用相似数据集确认了其中的两篇论文可复制。没有一个论文在使用新数据集时被鉴定为可复制的。我们提供了关于不可重复性和不可复制性的原因的观察。所有代码和数据都可在Codeocean上找到，网址为https://codeocean.com/capsule/6680116/tree。

0

相关内容

结构识别

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【文本生成现代方法】Modern Methods for Text Generation

【文本生成现代方法】Modern Methods for Text Generation

专知会员服务

44+阅读 · 2020年9月11日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

2019年自然语言处理NLP亮点总结，29页pdf，NLP Year in Review — 2019 NLP highlights for the year 2019.

2019年自然语言处理NLP亮点总结，29页pdf，NLP Year in Review — 2019 NLP highlights for the year 2019.

专知会员服务

69+阅读 · 2020年1月2日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

使用BERT做文本摘要

使用BERT做文本摘要

专知

23+阅读 · 2019年12月7日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

Ca2+依赖的蛋白酶Calpain对突触后谷氨酸受体的调控机制

国家自然科学基金

0+阅读 · 2015年12月31日

细胞复制性衰老过程中mTORC1靶基因的筛选与功能研究

国家自然科学基金

0+阅读 · 2015年12月31日

碳纳米空腔体系的内诱导电荷转移和非线性光学性质

国家自然科学基金

0+阅读 · 2014年12月31日

高能量密度弱体积效应三维有序介孔Sn/CMK-3纳米复合材料的合成及储锂性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

以宿主细胞蛋白为靶点的新型HCV抑制剂的合成与构效关系研究

国家自然科学基金

0+阅读 · 2012年12月31日

TRPM7参与EGF诱导的肺腺癌细胞迁移的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

全级配水工混凝土宏细观损伤断裂机理及断裂尺寸效应

国家自然科学基金

0+阅读 · 2012年12月31日

新型免疫负调控分子TIPE2调控CD4+T细胞的功能及在HBV感染中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

Cayley图的匹配可扩性和semi-Cayley图的谱

国家自然科学基金

0+阅读 · 2011年12月31日

半装配式钢筋混凝土剪力墙结构抗震性能和设计方法的研究

国家自然科学基金

0+阅读 · 2009年12月31日

Potential of the Julia programming language for high energy physics computing

Arxiv

0+阅读 · 2023年6月6日

Injecting knowledge into language generation: a case study in auto-charting after-visit care instructions from medical dialogue

Arxiv

0+阅读 · 2023年6月6日

Efficient exact enumeration of single-source geodesics on a non-convex polyhedron

Arxiv

0+阅读 · 2023年6月5日

Barriers for the performance of graph neural networks (GNN) in discrete random structures. A comment on~\cite{schuetz2022combinatorial},\cite{angelini2023modern},\cite{schuetz2023reply}

Arxiv

0+阅读 · 2023年6月5日

Statistical reliability of meta_analysis research claims for gas stove cooking_childhood respiratory health associations

Arxiv

0+阅读 · 2023年6月4日

Evaluating Regular Path Queries in GQL and SQL/PGQ: How Far Can The Classical Algorithms Take Us?

Arxiv

0+阅读 · 2023年6月3日

Local Model Reconstruction Attacks in Federated Learning and their Uses

Arxiv

0+阅读 · 2023年6月2日

Causality and Generalizability: Identifiability and Learning Methods

Arxiv

12+阅读 · 2021年10月4日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

A Survey on Knowledge Graphs: Representation, Acquisition and Applications

Arxiv

93+阅读 · 2020年2月2日

VIP会员

文章信息

相关主题

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【文本生成现代方法】Modern Methods for Text Generation

【文本生成现代方法】Modern Methods for Text Generation

专知会员服务

44+阅读 · 2020年9月11日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

2019年自然语言处理NLP亮点总结，29页pdf，NLP Year in Review — 2019 NLP highlights for the year 2019.

2019年自然语言处理NLP亮点总结，29页pdf，NLP Year in Review — 2019 NLP highlights for the year 2019.

专知会员服务

69+阅读 · 2020年1月2日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

NeurIPS 2025 | 自动化所新作速览（一）

大型语言模型（LLM）赋能的知识图谱构建：综述

NeurIPS 2025 | 自动化所新作速览（二）

领域特定文本分类中的预训练语言模型新进展：系统综述

相关资讯

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

使用BERT做文本摘要

使用BERT做文本摘要

专知

23+阅读 · 2019年12月7日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

相关论文

Potential of the Julia programming language for high energy physics computing

Arxiv

0+阅读 · 2023年6月6日

Injecting knowledge into language generation: a case study in auto-charting after-visit care instructions from medical dialogue

Arxiv

0+阅读 · 2023年6月6日

Efficient exact enumeration of single-source geodesics on a non-convex polyhedron

Arxiv

0+阅读 · 2023年6月5日

Barriers for the performance of graph neural networks (GNN) in discrete random structures. A comment on~\cite{schuetz2022combinatorial},\cite{angelini2023modern},\cite{schuetz2023reply}

Arxiv

0+阅读 · 2023年6月5日

Statistical reliability of meta_analysis research claims for gas stove cooking_childhood respiratory health associations

Arxiv

0+阅读 · 2023年6月4日

Evaluating Regular Path Queries in GQL and SQL/PGQ: How Far Can The Classical Algorithms Take Us?

Arxiv

0+阅读 · 2023年6月3日

Local Model Reconstruction Attacks in Federated Learning and their Uses

Arxiv

0+阅读 · 2023年6月2日

Causality and Generalizability: Identifiability and Learning Methods

Arxiv

12+阅读 · 2021年10月4日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

A Survey on Knowledge Graphs: Representation, Acquisition and Applications

Arxiv

93+阅读 · 2020年2月2日

相关基金

Ca2+依赖的蛋白酶Calpain对突触后谷氨酸受体的调控机制

国家自然科学基金

0+阅读 · 2015年12月31日

细胞复制性衰老过程中mTORC1靶基因的筛选与功能研究

国家自然科学基金

0+阅读 · 2015年12月31日

碳纳米空腔体系的内诱导电荷转移和非线性光学性质

国家自然科学基金

0+阅读 · 2014年12月31日

高能量密度弱体积效应三维有序介孔Sn/CMK-3纳米复合材料的合成及储锂性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

以宿主细胞蛋白为靶点的新型HCV抑制剂的合成与构效关系研究

国家自然科学基金

0+阅读 · 2012年12月31日

TRPM7参与EGF诱导的肺腺癌细胞迁移的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

全级配水工混凝土宏细观损伤断裂机理及断裂尺寸效应

国家自然科学基金

0+阅读 · 2012年12月31日

新型免疫负调控分子TIPE2调控CD4+T细胞的功能及在HBV感染中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

Cayley图的匹配可扩性和semi-Cayley图的谱

国家自然科学基金

0+阅读 · 2011年12月31日

半装配式钢筋混凝土剪力墙结构抗震性能和设计方法的研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员