ChartAB：一个用于图表定位与密集对齐的基准测试 (ChartAB: A Benchmark for Chart Grounding & Dense Alignment)

Charts play an important role in visualization, reasoning, data analysis, and the exchange of ideas among humans. However, existing vision-language models (VLMs) still lack accurate perception of details and struggle to extract fine-grained structures from charts. Such limitations in chart grounding also hinder their ability to compare multiple charts and reason over them. In this paper, we introduce a novel "ChartAlign Benchmark (ChartAB)" to provide a comprehensive evaluation of VLMs in chart grounding tasks, i.e., extracting tabular data, localizing visualization elements, and recognizing various attributes from charts of diverse types and complexities. We design a JSON template to facilitate the calculation of evaluation metrics specifically tailored for each grounding task. By incorporating a novel two-stage inference workflow, the benchmark can further evaluate VLMs' capability to align and compare elements/attributes across two charts. Our analysis of evaluations on several recent VLMs reveals new insights into their perception biases, weaknesses, robustness, and hallucinations in chart understanding. These findings highlight the fine-grained discrepancies among VLMs in chart understanding tasks and point to specific skills that need to be strengthened in current models.

翻译：图表在可视化、推理、数据分析以及人类思想交流中扮演着重要角色。然而，现有的视觉-语言模型（VLMs）仍缺乏对细节的精确感知，难以从图表中提取细粒度结构。这种图表定位能力的局限也阻碍了它们对多个图表进行比较和推理的能力。本文提出了一种新颖的“ChartAlign基准测试（ChartAB）”，旨在全面评估VLMs在图表定位任务中的表现，即从不同类型和复杂度的图表中提取表格数据、定位可视化元素以及识别各类属性。我们设计了一种JSON模板，以促进针对每个定位任务专门定制的评估指标的计算。通过引入一种新颖的两阶段推理流程，该基准测试能进一步评估VLMs在两个图表之间对齐和比较元素/属性的能力。我们对多个近期VLMs的评估分析揭示了它们在图表理解中的感知偏差、弱点、鲁棒性和幻觉现象的新见解。这些发现凸显了VLMs在图表理解任务中存在的细粒度差异，并指出了当前模型需要强化的具体技能。