PDF-VQA：一份新的PDF文件视觉问答数据集 (PDF-VQA: A New Dataset for Real-World VQA on PDF Documents) - 专知论文

会员服务 ·

0

视觉问答 · 问答 · 数据集 · 结构 · 结构关系 ·

2023 年 4 月 13 日

PDF-VQA: A New Dataset for Real-World VQA on PDF Documents

翻译：PDF-VQA：一份新的PDF文件视觉问答数据集

Yihao Ding,Siwen Luo,Hyunsuk Chung,Soyeon Caren Han

Document-based Visual Question Answering examines the document understanding of document images in conditions of natural language questions. We proposed a new document-based VQA dataset, PDF-VQA, to comprehensively examine the document understanding from various aspects, including document element recognition, document layout structural understanding as well as contextual understanding and key information extraction. Our PDF-VQA dataset extends the current scale of document understanding that limits on the single document page to the new scale that asks questions over the full document of multiple pages. We also propose a new graph-based VQA model that explicitly integrates the spatial and hierarchically structural relationships between different document elements to boost the document structural understanding. The performances are compared with several baselines over different question types and tasks\footnote{The full dataset will be released after paper acceptance.

翻译：---- 文档视觉问答研究探讨了自然语言问题条件下，对文档图像的文档理解。我们提出了一个新的文档视觉问答数据集PDF-VQA，全面考虑了各个方面的文档理解，包括文档元素识别、文档布局结构理解，以及上下文理解和关键信息提取。我们的PDF-VQA数据集扩展了当前在单个文档页面上进行文档理解的规模，包括多页文档的问答。我们还提出了一种新的基于图形的VQA模型，显式地集成了不同文档元素之间的空间和分层结构关系，以提高文档结构理解能力。我们比较了不同问题类型和任务上的多个基线体系的表现。在论文接受后，将公布完整数据集。

0

相关内容

视觉问答

视觉问答（Visual Question Answering，VQA），是一种涉及计算机视觉和自然语言处理的学习任务。这一任务的定义如下： A VQA system takes as input an image and a free-form, open-ended, natural-language question about the image and produces a natural-language answer as the output[1]。翻译为中文：一个VQA系统以一张图片和一个关于这张图片形式自由、开放式的自然语言问题作为输入，以生成一条自然语言答案作为输出。简单来说，VQA就是给定的图片进行问答。

知识荟萃

精品入门和进阶教程、论文和代码整理等

更多

查看相关VIP内容、论文、资讯等

【Hugging Face】使用自定义数据集微调语义分割模型，Fine-Tune a Semantic Segmentation Model with a Custom Dataset

【Hugging Face】使用自定义数据集微调语义分割模型，Fine-Tune a Semantic Segmentation Model with a Custom Dataset

专知会员服务

21+阅读 · 2022年3月18日

近期必读的七篇AAAI 2021【问答（QA）】相关论文和代码

专知会员服务

55+阅读 · 2021年2月2日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

【视频中的零样本动作识别：综述】Zero-Shot Action Recognition in Videos: A Survey

【视频中的零样本动作识别：综述】Zero-Shot Action Recognition in Videos: A Survey

专知会员服务

39+阅读 · 2019年10月12日

计算机视觉最佳实践、代码示例和相关文档

计算机视觉最佳实践、代码示例和相关文档

专知会员服务

20+阅读 · 2019年10月9日

论文小综 | Using External Knowledge on VQA

论文小综 | Using External Knowledge on VQA

开放知识图谱

10+阅读 · 2020年10月18日

视频分析/多模态学习论文、代码、数据集大列表

视频分析/多模态学习论文、代码、数据集大列表

专知

57+阅读 · 2019年7月13日

【论文推荐】最新六篇自动问答相关论文—无监督迁移学习、综述、生成式问答、QDEE、可扩展文档理解

【论文推荐】最新六篇自动问答相关论文—无监督迁移学习、综述、生成式问答、QDEE、可扩展文档理解

专知

12+阅读 · 2018年5月9日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

专知

15+阅读 · 2018年2月3日

面向PDF文档的数学公式搜索技术研究

国家自然科学基金

3+阅读 · 2014年12月31日

计算力学基本计算及可视化工具程序包的开发与集成

国家自然科学基金

2+阅读 · 2012年12月31日

基于超图形XGML的图像半结构化研究

国家自然科学基金

0+阅读 · 2012年12月31日

斜卧青霉中新转录因子PD003430对纤维素酶基因表达调控的研究

国家自然科学基金

0+阅读 · 2012年12月31日

结核分枝杆菌分泌脂蛋白对宿主巨噬细胞信号通路调控的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

Towards Weakly-Supervised Hate Speech Classification Across Datasets

Arxiv

0+阅读 · 2023年5月30日

Generate then Select: Open-ended Visual Question Answering Guided by World Knowledge

Arxiv

0+阅读 · 2023年5月30日

Masked World Models for Visual Control

Arxiv

0+阅读 · 2023年5月27日

SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning

Arxiv

11+阅读 · 2021年12月16日

Learning to Count Objects in Natural Images for Visual Question Answering

Arxiv

12+阅读 · 2018年2月15日

VIP会员

文章信息

相关主题

相关VIP内容

【Hugging Face】使用自定义数据集微调语义分割模型，Fine-Tune a Semantic Segmentation Model with a Custom Dataset

【Hugging Face】使用自定义数据集微调语义分割模型，Fine-Tune a Semantic Segmentation Model with a Custom Dataset

专知会员服务

21+阅读 · 2022年3月18日

近期必读的七篇AAAI 2021【问答（QA）】相关论文和代码

专知会员服务

55+阅读 · 2021年2月2日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

【视频中的零样本动作识别：综述】Zero-Shot Action Recognition in Videos: A Survey

【视频中的零样本动作识别：综述】Zero-Shot Action Recognition in Videos: A Survey

专知会员服务

39+阅读 · 2019年10月12日

计算机视觉最佳实践、代码示例和相关文档

计算机视觉最佳实践、代码示例和相关文档

专知会员服务

20+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《利用人工智能对军事行动进行建模》

《利用人工智能学习、优化与推演美国海军作战部队的战略布局与分散（续文）》

机器人、无人机与实时影像：应对城市爆炸威胁的三大技术方案

《指挥官意图消息中关键概念自动提取》最新47页

相关资讯

论文小综 | Using External Knowledge on VQA

论文小综 | Using External Knowledge on VQA

开放知识图谱

10+阅读 · 2020年10月18日

视频分析/多模态学习论文、代码、数据集大列表

视频分析/多模态学习论文、代码、数据集大列表

专知

57+阅读 · 2019年7月13日

【论文推荐】最新六篇自动问答相关论文—无监督迁移学习、综述、生成式问答、QDEE、可扩展文档理解

【论文推荐】最新六篇自动问答相关论文—无监督迁移学习、综述、生成式问答、QDEE、可扩展文档理解

专知

12+阅读 · 2018年5月9日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

专知

15+阅读 · 2018年2月3日

相关论文

Towards Weakly-Supervised Hate Speech Classification Across Datasets

Arxiv

0+阅读 · 2023年5月30日

Generate then Select: Open-ended Visual Question Answering Guided by World Knowledge

Arxiv

0+阅读 · 2023年5月30日

Masked World Models for Visual Control

Arxiv

0+阅读 · 2023年5月27日

SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning

Arxiv

11+阅读 · 2021年12月16日

Learning to Count Objects in Natural Images for Visual Question Answering

Arxiv

12+阅读 · 2018年2月15日

相关基金

面向PDF文档的数学公式搜索技术研究

国家自然科学基金

3+阅读 · 2014年12月31日

计算力学基本计算及可视化工具程序包的开发与集成

国家自然科学基金

2+阅读 · 2012年12月31日

基于超图形XGML的图像半结构化研究

国家自然科学基金

0+阅读 · 2012年12月31日

斜卧青霉中新转录因子PD003430对纤维素酶基因表达调控的研究

国家自然科学基金

0+阅读 · 2012年12月31日

结核分枝杆菌分泌脂蛋白对宿主巨噬细胞信号通路调控的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员