Natural language and visualization are being increasingly deployed together for supporting data analysis in different ways, from multimodal interaction to enriched data summaries and insights. Yet, researchers still lack systematic knowledge on how viewers verbalize their interpretations of visualizations, and how they interpret verbalizations of visualizations in such contexts. We describe two studies aimed at identifying characteristics of data and charts that are relevant in such tasks. The first study asks participants to verbalize what they see in scatterplots that depict various levels of correlations. The second study then asks participants to choose visualizations that match a given verbal description of correlation. We extract key concepts from responses, organize them in a taxonomy and analyze the categorized responses. We observe that participants use a wide range of vocabulary across all scatterplots, but particular concepts are preferred for higher levels of correlation. A comparison between the studies reveals the ambiguity of some of the concepts. We discuss how the results could inform the design of multimodal representations aligned with the data and analytical tasks, and present a research roadmap to deepen the understanding about visualizations and natural language.
翻译:自然语言和可视化被越来越多地用于支持数据分析的不同方式,从多式联运互动到丰富的数据摘要和洞察力。然而,研究人员仍然缺乏系统的知识,无法了解观众如何用语言来解释视觉化的解释,以及他们如何在这种背景下解释视觉化的言语。我们描述了旨在确定与此类任务相关的数据和图表特征的两项研究。第一项研究要求参与者用语言来描述他们所看到的描述不同程度相关关系的散射图中的内容。第二项研究接着要求参与者选择与特定口头描述相对应的可视化。我们从答复中提取关键概念,将其组织成分类分析答复。我们观察到,参与者使用广泛的词汇横跨所有散射点,但特别的概念更可取于更高层次的关联性。各项研究之间的比较显示了某些概念的模糊性。我们讨论了这些结果如何为与数据和分析性任务相一致的多式联运表述设计提供信息,并提出一份研究路线图,以加深对可视化和自然语言的理解。