基于问答模型的形式文档图像键值对提取方法 (A Question-Answering Approach to Key Value Pair Extraction from Form-like Document Images) - 专知论文

会员服务 ·

0

文档图像 · 提取 · 实体 · 问答 · 预测方法 ·

2023 年 4 月 17 日

A Question-Answering Approach to Key Value Pair Extraction from Form-like Document Images

翻译：基于问答模型的形式文档图像键值对提取方法

Kai Hu,Zhuoyuan Wu,Zhuoyao Zhong,Weihong Lin,Lei Sun,Qiang Huo

from arxiv, AAAI 2023

In this paper, we present a new question-answering (QA) based key-value pair extraction approach, called KVPFormer, to robustly extracting key-value relationships between entities from form-like document images. Specifically, KVPFormer first identifies key entities from all entities in an image with a Transformer encoder, then takes these key entities as \textbf{questions} and feeds them into a Transformer decoder to predict their corresponding \textbf{answers} (i.e., value entities) in parallel. To achieve higher answer prediction accuracy, we propose a coarse-to-fine answer prediction approach further, which first extracts multiple answer candidates for each identified question in the coarse stage and then selects the most likely one among these candidates in the fine stage. In this way, the learning difficulty of answer prediction can be effectively reduced so that the prediction accuracy can be improved. Moreover, we introduce a spatial compatibility attention bias into the self-attention/cross-attention mechanism for \Ours{} to better model the spatial interactions between entities. With these new techniques, our proposed \Ours{} achieves state-of-the-art results on FUNSD and XFUND datasets, outperforming the previous best-performing method by 7.2\% and 13.2\% in F1 score, respectively.

翻译：本文提出了一种名为KVPFormer的新型基于问答模型的键值对提取方法，可从形式文档图像中鲁棒地提取实体之间的键值关系。具体而言，KVPFormer首先使用Transformer编码器从图像中所有实体中识别出关键实体，然后将这些关键实体作为“问题”，并同时输入到Transformer解码器中以预测它们相应的“答案”（即值实体）。为了提高答案预测的准确性，我们提出了一种粗-细粒度答案预测方法，该方法首先在粗略阶段为每个已识别的问题提取多个答案候选项，然后在精细阶段中从这些候选项中选择可能性最高的一个来作为答案。通过这种方式，有效降低了答案预测的学习难度，提高了预测准确性。此外，我们还在\Ours{} 的自注意/交注意机制中引入了一种空间兼容性注意力偏置，以更好地建模实体之间的空间交互作用。借助这些新技术，我们提出的\Ours{} 在FUNSD和XFUND数据集上取得了最先进的结果，分别比先前表现最佳的方法高出7.2\%和13.2\%的F1得分。

0

相关内容

文档图像

【ACL2022】一个用于远距监督关系抽取的层级对比学习框架, HiCLRE: A Hierarchical Contrastive Learning Framework for Distantly Supervised Relation Extraction

【ACL2022】一个用于远距监督关系抽取的层级对比学习框架, HiCLRE: A Hierarchical Contrastive Learning Framework for Distantly Supervised Relation Extraction

专知会员服务

15+阅读 · 2022年3月24日

【CVPR 2022】【视频检索用多模态融合Transformer】Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval

【CVPR 2022】【视频检索用多模态融合Transformer】Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval

专知会员服务

29+阅读 · 2022年3月6日

[ACM MM 2021]结合文字识别结果的鲁棒和精确文本视觉问答

[ACM MM 2021]结合文字识别结果的鲁棒和精确文本视觉问答

专知会员服务

16+阅读 · 2021年11月14日

近期必读的七篇AAAI 2021【问答（QA）】相关论文和代码

专知会员服务

55+阅读 · 2021年2月2日

【2020关键词提取】使用多个本地功能从单个文档中提取关键字，YAKE! Keyword extraction from single documents using multiple local features

【2020关键词提取】使用多个本地功能从单个文档中提取关键字，YAKE! Keyword extraction from single documents using multiple local features

专知会员服务

26+阅读 · 2020年5月2日

【斯坦福大学-论文】实体上下文关系路径的知识图谱补全

【斯坦福大学-论文】实体上下文关系路径的知识图谱补全

专知会员服务

104+阅读 · 2020年2月20日

【WWW2020】学习上下文化文档表示用于医疗答案检索，Learning Contextualized Document Representations for Healthcare Answer Retrieval

【WWW2020】学习上下文化文档表示用于医疗答案检索，Learning Contextualized Document Representations for Healthcare Answer Retrieval

专知会员服务

26+阅读 · 2020年2月10日

【AAAI2020】实体关系联合抽取的编码器-解码器结构的有效建模（ Effective Modeling of Encoder-Decoder Architecture for Joint Entity and Relation Extraction）

【AAAI2020】实体关系联合抽取的编码器-解码器结构的有效建模（ Effective Modeling of Encoder-Decoder Architecture for Joint Entity and Relation Extraction）

专知会员服务

53+阅读 · 2019年11月22日

【论文推荐】基于BERT修剪的问答模型（Pruning a BERT-based Question Answering Model）

【论文推荐】基于BERT修剪的问答模型（Pruning a BERT-based Question Answering Model）

专知会员服务

30+阅读 · 2019年11月22日

【AAAI2020论文】关注实体以更好地理解文本（Attending to Entities for Better Text Understanding）

【AAAI2020论文】关注实体以更好地理解文本（Attending to Entities for Better Text Understanding）

专知会员服务

25+阅读 · 2019年11月15日

实体关系的联合抽取总结

实体关系的联合抽取总结

深度学习自然语言处理

18+阅读 · 2020年7月12日

【论文】Awesome Relation Extraction Paper（关系抽取）（PART V）

【论文】Awesome Relation Extraction Paper（关系抽取）（PART V）

AINLP

38+阅读 · 2019年9月3日

【论文】Awesome Relation Extraction Paper（关系抽取）（PART III）

【论文】Awesome Relation Extraction Paper（关系抽取）（PART III）

AINLP

25+阅读 · 2019年8月21日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

论文浅尝 | Global Relation Embedding for Relation Extraction

论文浅尝 | Global Relation Embedding for Relation Extraction

开放知识图谱

12+阅读 · 2019年3月3日

论文浅尝 | 基于知识库的类型实体和关系的联合抽取

论文浅尝 | 基于知识库的类型实体和关系的联合抽取

开放知识图谱

35+阅读 · 2018年12月9日

【论文推荐】最新七篇视觉问答（VQA）相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

【论文推荐】最新七篇视觉问答（VQA）相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

专知

17+阅读 · 2018年4月19日

【论文推荐】最新五篇信息抽取相关论文—端到端深度模型、调研、聊天机器人、自注意力、科学文本

【论文推荐】最新五篇信息抽取相关论文—端到端深度模型、调研、聊天机器人、自注意力、科学文本

专知

13+阅读 · 2018年4月4日

论文浅尝 | 基于神经网络的知识推理

论文浅尝 | 基于神经网络的知识推理

开放知识图谱

14+阅读 · 2018年3月12日

论文浅尝 | Question Answering over Freebase

论文浅尝 | Question Answering over Freebase

开放知识图谱

19+阅读 · 2018年1月9日

结合知识图谱的概率话题模型研究

国家自然科学基金

10+阅读 · 2015年12月31日

基于非独立同分布学习理论的图模型词义消歧及领域适应方法研究

国家自然科学基金

1+阅读 · 2015年12月31日

面向机器翻译的多词表达语义分析及应用研究

国家自然科学基金

1+阅读 · 2014年12月31日

语义Web中典型不确定知识的本体表示和融合的理论与方法研究

国家自然科学基金

2+阅读 · 2012年12月31日

本体导向的大规模语义信息声明式抽取方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

混合策略的机器翻译方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

跨语言信息检索中的机器翻译研究

国家自然科学基金

2+阅读 · 2011年12月31日

基于多模态概率主题模型的实体相关文本可视化

国家自然科学基金

1+阅读 · 2011年12月31日

基于在线百科和问答社区的中文文本蕴涵知识获取

国家自然科学基金

0+阅读 · 2011年12月31日

问答式信息检索中信息抽取技术研究

国家自然科学基金

3+阅读 · 2008年12月31日

Layout and Task Aware Instruction Prompt for Zero-shot Document Image Question Answering

Arxiv

0+阅读 · 2023年6月1日

Attention-Based Methods For Audio Question Answering

Arxiv

0+阅读 · 2023年5月31日

Measuring and Predicting the Quality of a Join for Data Discovery

Arxiv

0+阅读 · 2023年5月31日

An AMR-based Link Prediction Approach for Document-level Event Argument Extraction

Arxiv

0+阅读 · 2023年5月30日

Saliency Map Verbalization: Comparing Feature Importance Representations from Model-free and Instruction-based Methods

Arxiv

0+阅读 · 2023年5月30日

Alteration-free and Model-agnostic Origin Attribution of Generated Images

Arxiv

0+阅读 · 2023年5月29日

Relational Learning with Gated and Attentive Neighbor Aggregator for Few-Shot Knowledge Graph Completion

Arxiv

12+阅读 · 2021年4月27日

AliMe KBQA: Question Answering over Structured Knowledge for E-commerce Customer Service

AliMe KBQA: Question Answering over Structured Knowledge for E-commerce Customer Service

Arxiv

23+阅读 · 2019年12月12日

An Interpretable Reasoning Network for Multi-Relation Question Answering

Arxiv

13+阅读 · 2018年6月1日

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

Arxiv

14+阅读 · 2018年3月14日

VIP会员

文章信息

相关主题

相关VIP内容

【ACL2022】一个用于远距监督关系抽取的层级对比学习框架, HiCLRE: A Hierarchical Contrastive Learning Framework for Distantly Supervised Relation Extraction

【ACL2022】一个用于远距监督关系抽取的层级对比学习框架, HiCLRE: A Hierarchical Contrastive Learning Framework for Distantly Supervised Relation Extraction

专知会员服务

15+阅读 · 2022年3月24日

【CVPR 2022】【视频检索用多模态融合Transformer】Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval

【CVPR 2022】【视频检索用多模态融合Transformer】Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval

专知会员服务

29+阅读 · 2022年3月6日

[ACM MM 2021]结合文字识别结果的鲁棒和精确文本视觉问答

[ACM MM 2021]结合文字识别结果的鲁棒和精确文本视觉问答

专知会员服务

16+阅读 · 2021年11月14日

近期必读的七篇AAAI 2021【问答（QA）】相关论文和代码

专知会员服务

55+阅读 · 2021年2月2日

【2020关键词提取】使用多个本地功能从单个文档中提取关键字，YAKE! Keyword extraction from single documents using multiple local features

【2020关键词提取】使用多个本地功能从单个文档中提取关键字，YAKE! Keyword extraction from single documents using multiple local features

专知会员服务

26+阅读 · 2020年5月2日

【斯坦福大学-论文】实体上下文关系路径的知识图谱补全

【斯坦福大学-论文】实体上下文关系路径的知识图谱补全

专知会员服务

104+阅读 · 2020年2月20日

【WWW2020】学习上下文化文档表示用于医疗答案检索，Learning Contextualized Document Representations for Healthcare Answer Retrieval

【WWW2020】学习上下文化文档表示用于医疗答案检索，Learning Contextualized Document Representations for Healthcare Answer Retrieval

专知会员服务

26+阅读 · 2020年2月10日

【AAAI2020】实体关系联合抽取的编码器-解码器结构的有效建模（ Effective Modeling of Encoder-Decoder Architecture for Joint Entity and Relation Extraction）

【AAAI2020】实体关系联合抽取的编码器-解码器结构的有效建模（ Effective Modeling of Encoder-Decoder Architecture for Joint Entity and Relation Extraction）

专知会员服务

53+阅读 · 2019年11月22日

【论文推荐】基于BERT修剪的问答模型（Pruning a BERT-based Question Answering Model）

【论文推荐】基于BERT修剪的问答模型（Pruning a BERT-based Question Answering Model）

专知会员服务

30+阅读 · 2019年11月22日

【AAAI2020论文】关注实体以更好地理解文本（Attending to Entities for Better Text Understanding）

【AAAI2020论文】关注实体以更好地理解文本（Attending to Entities for Better Text Understanding）

专知会员服务

25+阅读 · 2019年11月15日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

实体关系的联合抽取总结

实体关系的联合抽取总结

深度学习自然语言处理

18+阅读 · 2020年7月12日

【论文】Awesome Relation Extraction Paper（关系抽取）（PART V）

【论文】Awesome Relation Extraction Paper（关系抽取）（PART V）

AINLP

38+阅读 · 2019年9月3日

【论文】Awesome Relation Extraction Paper（关系抽取）（PART III）

【论文】Awesome Relation Extraction Paper（关系抽取）（PART III）

AINLP

25+阅读 · 2019年8月21日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

论文浅尝 | Global Relation Embedding for Relation Extraction

论文浅尝 | Global Relation Embedding for Relation Extraction

开放知识图谱

12+阅读 · 2019年3月3日

论文浅尝 | 基于知识库的类型实体和关系的联合抽取

论文浅尝 | 基于知识库的类型实体和关系的联合抽取

开放知识图谱

35+阅读 · 2018年12月9日

【论文推荐】最新七篇视觉问答（VQA）相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

【论文推荐】最新七篇视觉问答（VQA）相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

专知

17+阅读 · 2018年4月19日

【论文推荐】最新五篇信息抽取相关论文—端到端深度模型、调研、聊天机器人、自注意力、科学文本

【论文推荐】最新五篇信息抽取相关论文—端到端深度模型、调研、聊天机器人、自注意力、科学文本

专知

13+阅读 · 2018年4月4日

论文浅尝 | 基于神经网络的知识推理

论文浅尝 | 基于神经网络的知识推理

开放知识图谱

14+阅读 · 2018年3月12日

论文浅尝 | Question Answering over Freebase

论文浅尝 | Question Answering over Freebase

开放知识图谱

19+阅读 · 2018年1月9日

相关论文

Layout and Task Aware Instruction Prompt for Zero-shot Document Image Question Answering

Arxiv

0+阅读 · 2023年6月1日

Attention-Based Methods For Audio Question Answering

Arxiv

0+阅读 · 2023年5月31日

Measuring and Predicting the Quality of a Join for Data Discovery

Arxiv

0+阅读 · 2023年5月31日

An AMR-based Link Prediction Approach for Document-level Event Argument Extraction

Arxiv

0+阅读 · 2023年5月30日

Saliency Map Verbalization: Comparing Feature Importance Representations from Model-free and Instruction-based Methods

Arxiv

0+阅读 · 2023年5月30日

Alteration-free and Model-agnostic Origin Attribution of Generated Images

Arxiv

0+阅读 · 2023年5月29日

Relational Learning with Gated and Attentive Neighbor Aggregator for Few-Shot Knowledge Graph Completion

Arxiv

12+阅读 · 2021年4月27日

AliMe KBQA: Question Answering over Structured Knowledge for E-commerce Customer Service

AliMe KBQA: Question Answering over Structured Knowledge for E-commerce Customer Service

Arxiv

23+阅读 · 2019年12月12日

An Interpretable Reasoning Network for Multi-Relation Question Answering

Arxiv

13+阅读 · 2018年6月1日

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

Arxiv

14+阅读 · 2018年3月14日

相关基金

结合知识图谱的概率话题模型研究

国家自然科学基金

10+阅读 · 2015年12月31日

基于非独立同分布学习理论的图模型词义消歧及领域适应方法研究

国家自然科学基金

1+阅读 · 2015年12月31日

面向机器翻译的多词表达语义分析及应用研究

国家自然科学基金

1+阅读 · 2014年12月31日

语义Web中典型不确定知识的本体表示和融合的理论与方法研究

国家自然科学基金

2+阅读 · 2012年12月31日

本体导向的大规模语义信息声明式抽取方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

混合策略的机器翻译方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

跨语言信息检索中的机器翻译研究

国家自然科学基金

2+阅读 · 2011年12月31日

基于多模态概率主题模型的实体相关文本可视化

国家自然科学基金

1+阅读 · 2011年12月31日

基于在线百科和问答社区的中文文本蕴涵知识获取

国家自然科学基金

0+阅读 · 2011年12月31日

问答式信息检索中信息抽取技术研究

国家自然科学基金

3+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员