Bar charts are an effective way to convey numeric information, but today's algorithms cannot parse them. Existing methods fail when faced with even minor variations in appearance. Here, we present DVQA, a dataset that tests many aspects of bar chart understanding in a question answering framework. Unlike visual question answering (VQA), DVQA requires processing words and answers that are unique to a particular bar chart. State-of-the-art VQA algorithms perform poorly on DVQA, and we propose two strong baselines that perform considerably better. Our work will enable algorithms to automatically extract numeric and semantic information from vast quantities of bar charts found in scientific publications, Internet articles, business reports, and many other areas.
翻译:条形图表是传递数字信息的有效方式, 但今天的算法无法分析它们。 现有方法在面对表面上的微小变化时失败了。 在这里, 我们提供DVQA, 这是一个在回答问题的框架内测试条形图理解的许多方面的数据集。 与直观回答( VQA ) 不同, DVQA 需要处理特定条形图表所特有的单词和答案。 高端VQA 算法在DVQA 上表现不佳, 我们建议了两个效果更好的强势基线。 我们的工作将使算法能够从科学出版物、互联网文章、商业报告和其他许多领域的大量条形图中自动提取数字和语义信息。