类似表格文件的任意查询的数值检索 (Value Retrieval with Arbitrary Queries for Form-like Documents) - 专知论文

会员服务 ·

0

可理解性 · 可约的 · MoDELS · 语言模型化 · Performer ·

2021 年 12 月 15 日

Value Retrieval with Arbitrary Queries for Form-like Documents

翻译：类似表格文件的任意查询的数值检索

Mingfei Gao,Le Xue,Chetan Ramaiah,Chen Xing,Ran Xu,Caiming Xiong

We propose value retrieval with arbitrary queries for form-like documents to reduce human effort of processing forms. Unlike previous methods that only address a fixed set of field items, our method predicts target value for an arbitrary query based on the understanding of layout and semantics of a form. To further boost model performance, we propose a simple document language modeling (simpleDLM) strategy to improve document understanding on large-scale model pre-training. Experimental results show that our method outperforms our baselines significantly and the simpleDLM further improves our performance on value retrieval by around 17\% F1 score compared with the state-of-the-art pre-training method. Code will be made publicly available.

翻译：我们建议以任意查询方式检索类似表格的文件,以减少人的处理表格。与以往只处理一套固定的实地项目的方法不同,我们的方法根据对表格的布局和语义的理解,预测了任意查询的目标值。为了进一步提高模型性能,我们提议了一个简单的文件语言建模战略(simpleDLM),以提高对大规模示范培训前文件的理解。实验结果表明,我们的方法大大超过我们的基线,而简单的DLM进一步使我们的数值检索业绩比最先进的培训前方法提高了约17 ⁇ F1分。守则将公布于众。

0

相关内容

可理解性

【ACL2021】Hi-Transformer：一种具有层次化和交互式特点的长文档建模结构

专知会员服务

13+阅读 · 2021年8月4日

【PKDD2021】成对偏好学习，109页ppt，Pairwise Preference Learning

专知会员服务

21+阅读 · 2021年6月10日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【O'Reilly TensorFlow Conference 2019】TensorFlow，开源和IBM（TensorFlow, open source, and IBM ），IBM | Fred Reiss

【O'Reilly TensorFlow Conference 2019】TensorFlow，开源和IBM（TensorFlow, open source, and IBM ），IBM | Fred Reiss

专知会员服务

11+阅读 · 2019年11月14日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

已删除

将门创投

4+阅读 · 2019年6月5日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Jointly Optimizing Query Encoder and Product Quantization to Improve Retrieval Performance

Arxiv

6+阅读 · 2021年8月2日

Graph-based Hierarchical Relevance Matching Signals for Ad-hoc Retrieval

Arxiv

10+阅读 · 2021年2月22日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Arxiv

11+阅读 · 2020年10月20日

CEDR: Contextualized Embeddings for Document Ranking

Arxiv

4+阅读 · 2019年8月19日

Passage Re-ranking with BERT

Arxiv

4+阅读 · 2019年2月18日

Learning to Rank Query Graphs for Complex Question Answering over Knowledge Graphs

Arxiv

4+阅读 · 2018年11月2日

Dialog-based Interactive Image Retrieval

Arxiv

5+阅读 · 2018年5月1日

Learning Cross-Modal Deep Embeddings for Multi-Object Image Retrieval using Text and Sketch

Arxiv

5+阅读 · 2018年4月28日

Text-to-Clip Video Retrieval with Early Fusion and Re-Captioning

Arxiv

4+阅读 · 2018年4月13日

Pay More Attention - Neural Architectures for Question-Answering

Arxiv

5+阅读 · 2018年3月25日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

【ACL2021】Hi-Transformer：一种具有层次化和交互式特点的长文档建模结构

专知会员服务

13+阅读 · 2021年8月4日

【PKDD2021】成对偏好学习，109页ppt，Pairwise Preference Learning

专知会员服务

21+阅读 · 2021年6月10日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【O'Reilly TensorFlow Conference 2019】TensorFlow，开源和IBM（TensorFlow, open source, and IBM ），IBM | Fred Reiss

【O'Reilly TensorFlow Conference 2019】TensorFlow，开源和IBM（TensorFlow, open source, and IBM ），IBM | Fred Reiss

专知会员服务

11+阅读 · 2019年11月14日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

大模型解决方案白皮书：社交陪伴场景全流程落地指南

面向具身操作的视觉-语言-动作模型综述

相关资讯

已删除

将门创投

4+阅读 · 2019年6月5日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

相关论文

Jointly Optimizing Query Encoder and Product Quantization to Improve Retrieval Performance

Arxiv

6+阅读 · 2021年8月2日

Graph-based Hierarchical Relevance Matching Signals for Ad-hoc Retrieval

Arxiv

10+阅读 · 2021年2月22日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Arxiv

11+阅读 · 2020年10月20日

CEDR: Contextualized Embeddings for Document Ranking

Arxiv

4+阅读 · 2019年8月19日

Passage Re-ranking with BERT

Arxiv

4+阅读 · 2019年2月18日

Learning to Rank Query Graphs for Complex Question Answering over Knowledge Graphs

Arxiv

4+阅读 · 2018年11月2日

Dialog-based Interactive Image Retrieval

Arxiv

5+阅读 · 2018年5月1日

Learning Cross-Modal Deep Embeddings for Multi-Object Image Retrieval using Text and Sketch

Arxiv

5+阅读 · 2018年4月28日

Text-to-Clip Video Retrieval with Early Fusion and Re-Captioning

Arxiv

4+阅读 · 2018年4月13日

Pay More Attention - Neural Architectures for Question-Answering

Arxiv

5+阅读 · 2018年3月25日

微信扫码咨询专知VIP会员