Choropleth maps are a common visual representation for region-specific tabular data and are used in a number of different venues (newspapers, articles, etc). These maps are human-readable but are often challenging to deal with when trying to extract data for screen readers, analyses, or other related tasks. Recent research into Visual-Question Answering (VQA) has studied question answering on human-generated charts (ChartQA), such as bar, line, and pie charts. However, little work has paid attention to understanding maps; general VQA models, and ChartQA models, suffer when asked to perform this task. To facilitate and encourage research in this area, we present MapQA, a large-scale dataset of ~800K question-answer pairs over ~60K map images. Our task tests various levels of map understanding, from surface questions about map styles to complex questions that require reasoning on the underlying data. We present the unique challenges of MapQA that frustrate most strong baseline algorithms designed for ChartQA and general VQA tasks. We also present a novel algorithm, Visual Multi-Output Data Extraction based QA (V-MODEQA) for MapQA. V-MODEQA extracts the underlying structured data from a map image with a multi-output model and then performs reasoning on the extracted data. Our experimental results show that V-MODEQA has better overall performance and robustness on MapQA than the state-of-the-art ChartQA and VQA algorithms by capturing the unique properties in map question answering.
翻译:最近对视觉问题解答(VQA)的研究研究了在人造图表上回答的问题,如条形图、线形图和派形图。然而,很少注意了解地图的工作;一般的VQA模型和图表QA模型在执行任务时会受到影响。为了便利和鼓励这方面的研究,我们提出地图QA,这是为屏幕阅读器、分析或其他相关任务提取数据时的大规模数据组合。最近对视觉问题解答(VQA)的研究研究了从关于地图样式的表面问题到需要推理基本数据等复杂问题的地图解答。我们介绍了地图QA的独特挑战,它挫败了为图表QA和通用VQA任务设计的最强有力的基线算法。为了便利和鼓励这方面的研究,我们提出了地图QA(OVD多功能解答)的大规模数据组合。我们的任务测试了地图模型的不同程度,从地面问题到需要推理的基本数据。我们还介绍了“VQAODOA” 和“ODA”基础数据解析度图的更精确算法。我们从“VGA”数字解算了“Q”。