ChartReader: 无启发式规则的图表去渲染和理解的统一框架 (ChartReader: A Unified Framework for Chart Derendering and Comprehension without Heuristic Rules) - 专知论文

会员服务 ·

0

启发式 · EAD · TaPas · 位置嵌入 · 复杂数据 ·

2023 年 4 月 5 日

ChartReader: A Unified Framework for Chart Derendering and Comprehension without Heuristic Rules

翻译：ChartReader: 无启发式规则的图表去渲染和理解的统一框架

Zhi-Qi Cheng,Qi Dai,Siyao Li,Jingdong Sun,Teruko Mitamura,Alexander G. Hauptmann

Charts are a powerful tool for visually conveying complex data, but their comprehension poses a challenge due to the diverse chart types and intricate components. Existing chart comprehension methods suffer from either heuristic rules or an over-reliance on OCR systems, resulting in suboptimal performance. To address these issues, we present ChartReader, a unified framework that seamlessly integrates chart derendering and comprehension tasks. Our approach includes a transformer-based chart component detection module and an extended pre-trained vision-language model for chart-to-X tasks. By learning the rules of charts automatically from annotated datasets, our approach eliminates the need for manual rule-making, reducing effort and enhancing accuracy.~We also introduce a data variable replacement technique and extend the input and position embeddings of the pre-trained model for cross-task training. We evaluate ChartReader on Chart-to-Table, ChartQA, and Chart-to-Text tasks, demonstrating its superiority over existing methods. Our proposed framework can significantly reduce the manual effort involved in chart analysis, providing a step towards a universal chart understanding model. Moreover, our approach offers opportunities for plug-and-play integration with mainstream LLMs such as T5 and TaPas, extending their capability to chart comprehension tasks. The code is available at https://github.com/zhiqic/ChartReader.

翻译：图表是传达复杂数据的强大工具，但它们的理解由于不同类型和复杂的构成部分而存在挑战。现有的图表理解方法要么受启发式规则的制约，要么过度依赖OCR系统，导致效果不佳。为解决这些问题，我们提出了ChartReader，一个无缝集成图表去渲染和理解任务的统一框架。我们的方法包括基于transformer的图表组件检测模块和一个扩展的基于预训练的视觉-语言模型，用于将图表转换为文本任务。通过从注释数据集中自动学习图表规则，我们的方法消除了手动制定规则的需求，减少了工作量并提高了准确性。我们还介绍了一种数据变量替换技术，并扩展了预训练模型的输入和位置嵌入，进行跨任务训练。我们在Chart-to-Table、ChartQA和Chart-to-Text任务上评估了ChartReader，证明了其优于现有方法。我们提出的框架可以显著减少图表分析中的手动工作量，是迈向一个通用图表理解模型的一步。此外，我们的方法为主流LLM（如T5和TaPas）提供了即插即用的整合机会，扩展了它们对图表理解任务的能力。代码可在https://github.com/zhiqic/ChartReader获取。

0

相关内容

启发式

Bioinformatics | DeepRank-GNN:蛋白质-蛋白质界面的图神经网络框架

Bioinformatics | DeepRank-GNN:蛋白质-蛋白质界面的图神经网络框架

专知会员服务

9+阅读 · 2022年12月5日

【KDD2022】基于知识增强提示学习的统一会话推荐系统

【KDD2022】基于知识增强提示学习的统一会话推荐系统

专知会员服务

29+阅读 · 2022年6月26日

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

专知会员服务

28+阅读 · 2022年2月20日

【NeurIPS 2020】图神经网络的参数化解释器，Parameterized Explainer for GNN

【NeurIPS 2020】图神经网络的参数化解释器，Parameterized Explainer for GNN

专知会员服务

22+阅读 · 2020年11月13日

【KDD 2020】基于互信息最大化的多知识图谱语义融合

【KDD 2020】基于互信息最大化的多知识图谱语义融合

专知会员服务

43+阅读 · 2020年9月7日

【SIGMOD2020】一个全面的主动学习方法的实体匹配基准框架，A Comprehensive Benchmark Framework for Active Learning Methods in Entity Matching

【SIGMOD2020】一个全面的主动学习方法的实体匹配基准框架，A Comprehensive Benchmark Framework for Active Learning Methods in Entity Matching

专知会员服务

24+阅读 · 2020年3月31日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

【NLP| 推荐文章】从统一文本到文本探讨迁移学习的局限性（Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer）

【NLP| 推荐文章】从统一文本到文本探讨迁移学习的局限性（Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer）

专知会员服务

20+阅读 · 2019年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

33+阅读 · 2019年10月18日

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

LibRec 精选：推荐系统的常用数据集

LibRec 精选：推荐系统的常用数据集

LibRec智能推荐

17+阅读 · 2019年2月15日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

有理函数非旋转Fatou域与不连通Julia集的结构

国家自然科学基金

0+阅读 · 2014年12月31日

量子群与Tewilliger代数的相关问题研究

国家自然科学基金

1+阅读 · 2013年12月31日

遍历理论中的复杂性与族

国家自然科学基金

1+阅读 · 2013年12月31日

三维模型在异构空间中的语义迁移方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

Tip60在oxLDL诱导的血管平滑肌细胞自噬及增殖中的作用机制

国家自然科学基金

0+阅读 · 2013年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

对象模型上交互式修复生成技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

基于随机图模型的蛋白质三级结构预测算法研究

国家自然科学基金

1+阅读 · 2008年12月31日

RET-LLM: Towards a General Read-Write Memory for Large Language Models

Arxiv

0+阅读 · 2023年5月23日

Question Answering as Programming for Solving Time-Sensitive Questions

Arxiv

0+阅读 · 2023年5月23日

DISPLACE Challenge: DIarization of SPeaker and LAnguage in Conversational Environments

Arxiv

0+阅读 · 2023年5月23日

A Machine Learning Approach to Detect Dehydration in Afghan Children

Arxiv

0+阅读 · 2023年5月22日

Rethinking the Evaluation for Conversational Recommendation in the Era of Large Language Models

Arxiv

1+阅读 · 2023年5月22日

Multiteam semantics for interventionist counterfactuals: probabilities and causation

Arxiv

0+阅读 · 2023年5月22日

Understanding HTML with Large Language Models

Understanding HTML with Large Language Models

Arxiv

0+阅读 · 2023年5月19日

Neural Architecture Search without Training

Neural Architecture Search without Training

Arxiv

10+阅读 · 2021年6月11日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

Matching Networks for One Shot Learning

Arxiv

10+阅读 · 2017年12月29日

VIP会员

文章信息

相关主题

相关VIP内容

Bioinformatics | DeepRank-GNN:蛋白质-蛋白质界面的图神经网络框架

Bioinformatics | DeepRank-GNN:蛋白质-蛋白质界面的图神经网络框架

专知会员服务

9+阅读 · 2022年12月5日

【KDD2022】基于知识增强提示学习的统一会话推荐系统

【KDD2022】基于知识增强提示学习的统一会话推荐系统

专知会员服务

29+阅读 · 2022年6月26日

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

专知会员服务

28+阅读 · 2022年2月20日

【NeurIPS 2020】图神经网络的参数化解释器，Parameterized Explainer for GNN

【NeurIPS 2020】图神经网络的参数化解释器，Parameterized Explainer for GNN

专知会员服务

22+阅读 · 2020年11月13日

【KDD 2020】基于互信息最大化的多知识图谱语义融合

【KDD 2020】基于互信息最大化的多知识图谱语义融合

专知会员服务

43+阅读 · 2020年9月7日

【SIGMOD2020】一个全面的主动学习方法的实体匹配基准框架，A Comprehensive Benchmark Framework for Active Learning Methods in Entity Matching

【SIGMOD2020】一个全面的主动学习方法的实体匹配基准框架，A Comprehensive Benchmark Framework for Active Learning Methods in Entity Matching

专知会员服务

24+阅读 · 2020年3月31日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

【NLP| 推荐文章】从统一文本到文本探讨迁移学习的局限性（Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer）

【NLP| 推荐文章】从统一文本到文本探讨迁移学习的局限性（Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer）

专知会员服务

20+阅读 · 2019年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

33+阅读 · 2019年10月18日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

LibRec 精选：推荐系统的常用数据集

LibRec 精选：推荐系统的常用数据集

LibRec智能推荐

17+阅读 · 2019年2月15日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

相关论文

RET-LLM: Towards a General Read-Write Memory for Large Language Models

Arxiv

0+阅读 · 2023年5月23日

Question Answering as Programming for Solving Time-Sensitive Questions

Arxiv

0+阅读 · 2023年5月23日

DISPLACE Challenge: DIarization of SPeaker and LAnguage in Conversational Environments

Arxiv

0+阅读 · 2023年5月23日

A Machine Learning Approach to Detect Dehydration in Afghan Children

Arxiv

0+阅读 · 2023年5月22日

Rethinking the Evaluation for Conversational Recommendation in the Era of Large Language Models

Arxiv

1+阅读 · 2023年5月22日

Multiteam semantics for interventionist counterfactuals: probabilities and causation

Arxiv

0+阅读 · 2023年5月22日

Understanding HTML with Large Language Models

Understanding HTML with Large Language Models

Arxiv

0+阅读 · 2023年5月19日

Neural Architecture Search without Training

Neural Architecture Search without Training

Arxiv

10+阅读 · 2021年6月11日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

Matching Networks for One Shot Learning

Arxiv

10+阅读 · 2017年12月29日

相关基金

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

有理函数非旋转Fatou域与不连通Julia集的结构

国家自然科学基金

0+阅读 · 2014年12月31日

量子群与Tewilliger代数的相关问题研究

国家自然科学基金

1+阅读 · 2013年12月31日

遍历理论中的复杂性与族

国家自然科学基金

1+阅读 · 2013年12月31日

三维模型在异构空间中的语义迁移方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

Tip60在oxLDL诱导的血管平滑肌细胞自噬及增殖中的作用机制

国家自然科学基金

0+阅读 · 2013年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

对象模型上交互式修复生成技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

基于随机图模型的蛋白质三级结构预测算法研究

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员