Multimodal Large Language Models (MLLMs) can interpret data visualizations, but what makes a visualization understandable to these models? Do factors like color, shape, and text influence legibility, and how does this compare to human perception? In this paper, we build on prior work to systematically assess which visualization characteristics impact MLLM interpretability. We expanded the Visualization Literacy Assessment Test (VLAT) test set from 12 to 380 visualizations by varying plot types, colors, and titles. This allowed us to statistically analyze how these features affect model performance. Our findings suggest that while color palettes have no significant impact on accuracy, plot types and the type of title significantly affect MLLM performance. We observe similar trends for model omissions. Based on these insights, we look into which plot types are beneficial for MLLMs in different tasks and propose visualization design principles that enhance MLLM readability. Additionally, we make the extended VLAT test set, VLAT ex, publicly available on https://osf.io/ermwx/ together with our supplemental material for future model testing and evaluation.
翻译:暂无翻译