Charts are a powerful tool for visually conveying complex data, but their comprehension poses a challenge due to the diverse chart types and intricate components. Existing chart comprehension methods suffer from either heuristic rules or an over-reliance on OCR systems, resulting in suboptimal performance. To address these issues, we present ChartReader, a unified framework that seamlessly integrates chart derendering and comprehension tasks. Our approach includes a transformer-based chart component detection module and an extended pre-trained vision-language model for chart-to-X tasks. By learning the rules of charts automatically from annotated datasets, our approach eliminates the need for manual rule-making, reducing effort and enhancing accuracy.~We also introduce a data variable replacement technique and extend the input and position embeddings of the pre-trained model for cross-task training. We evaluate ChartReader on Chart-to-Table, ChartQA, and Chart-to-Text tasks, demonstrating its superiority over existing methods. Our proposed framework can significantly reduce the manual effort involved in chart analysis, providing a step towards a universal chart understanding model. Moreover, our approach offers opportunities for plug-and-play integration with mainstream LLMs such as T5 and TaPas, extending their capability to chart comprehension tasks. The code is available at https://github.com/zhiqic/ChartReader.
翻译:图表是传达复杂数据的强大工具,但它们的理解由于不同类型和复杂的构成部分而存在挑战。现有的图表理解方法要么受启发式规则的制约,要么过度依赖OCR系统,导致效果不佳。为解决这些问题,我们提出了ChartReader,一个无缝集成图表去渲染和理解任务的统一框架。我们的方法包括基于transformer的图表组件检测模块和一个扩展的基于预训练的视觉-语言模型,用于将图表转换为文本任务。通过从注释数据集中自动学习图表规则,我们的方法消除了手动制定规则的需求,减少了工作量并提高了准确性。我们还介绍了一种数据变量替换技术,并扩展了预训练模型的输入和位置嵌入,进行跨任务训练。我们在Chart-to-Table、ChartQA和Chart-to-Text任务上评估了ChartReader,证明了其优于现有方法。我们提出的框架可以显著减少图表分析中的手动工作量,是迈向一个通用图表理解模型的一步。此外,我们的方法为主流LLM(如T5和TaPas)提供了即插即用的整合机会,扩展了它们对图表理解任务的能力。代码可在https://github.com/zhiqic/ChartReader获取。