Infographics are often an integral component of scientific documents for reporting qualitative or quantitative findings as they make it much simpler to comprehend the underlying complex information. However, their interpretation continues to be a challenge for the blind, low-vision, and other print-impaired (BLV) individuals. In this paper, we propose ChartParser, a fully automated pipeline that leverages deep learning, OCR, and image processing techniques to extract all figures from a research paper, classify them into various chart categories (bar chart, line chart, etc.) and obtain relevant information from them, specifically bar charts (including horizontal, vertical, stacked horizontal and stacked vertical charts) which already have several exciting challenges. Finally, we present the retrieved content in a tabular format that is screen-reader friendly and accessible to the BLV users. We present a thorough evaluation of our approach by applying our pipeline to sample real-world annotated bar charts from research papers.
翻译:地图往往是报告定性或定量调查结果的科学文件的一个组成部分,因为这些文件使得理解基本复杂信息容易得多,但是,对地图的解释仍然是盲人、低视力者和其他印刷障碍者(BLV)的一项挑战。在本文中,我们提出“图纸”,这是一个完全自动化的管道,利用深层学习、OCR和图像处理技术从研究文件中提取所有数字,将其分为不同的图表类别(图、线图等),并从中获取相关信息,特别是已经存在若干令人兴奋的挑战的条形图(包括横向、纵向、叠叠叠的横向和叠叠叠的垂直图表)。最后,我们以表格形式介绍检索到的内容,便于屏幕阅读,便于BLV用户查阅。我们通过将我们的管道用于实际样本,从研究论文中提取附加注释的条形图,对我们的方法进行了彻底的评估。