Retrieving charts from a large corpus is a fundamental task that can benefit numerous applications such as visualization recommendations.The retrieved results are expected to conform to both explicit visual attributes (e.g., chart type, colormap) and implicit user intents (e.g., design style, context information) that vary upon application scenarios. However, existing example-based chart retrieval methods are built upon non-decoupled and low-level visual features that are hard to interpret, while definition-based ones are constrained to pre-defined attributes that are hard to extend. In this work, we propose a new framework, namely WYTIWYR (What-You-Think-Is-What-You-Retrieve), that integrates user intents into the chart retrieval process. The framework consists of two stages: first, the Annotation stage disentangles the visual attributes within the bitmap query chart; and second, the Retrieval stage embeds the user's intent with customized text prompt as well as query chart, to recall targeted retrieval result. We develop a prototype WYTIWYR system leveraging a contrastive language-image pre-training (CLIP) model to achieve zero-shot classification, and test the prototype on a large corpus with charts crawled from the Internet. Quantitative experiments, case studies, and qualitative interviews are conducted. The results demonstrate the usability and effectiveness of our proposed framework.
翻译:从大量语料库中检索图表是一项基本任务,可以受益于诸多应用,例如可视化推荐。检索结果既应符合显式的视觉属性(例如图表类型、调色板),又应符合隐含的用户意图(例如设计风格、上下文信息),这些意图因应用场景而异。然而,现有的基于示例的图表检索方法是建立在非解耦的低层次视觉特征上,很难进行解释,而基于定义的方法则受到预定义属性的限制,很难进行扩展。在本文中,我们提出了一种新的框架-WYTIWYR(What-You-Think-Is-What-You-Retrieve),将用户意图集成到图表检索过程中。该框架分为两个阶段:第一阶段为注释阶段,将位图查询图表中的视觉属性分解开来;第二阶段为检索阶段,嵌入用户意图,采用定制的文本提示以及查询图表,以检索目标检索结果。我们开发了一个 WYTIWYR 原型系统,利用对比式语言-图像预训练(CLIP)模型实现零-shot分类,并在从互联网抓取的图表中对原型进行测试。我们进行了定量实验、案例研究和定性访谈。结果证明了我们提出的框架的可用性和有效性。