用于文件spananers 的语法 (Grammars for Document Spanners) - 专知论文

会员服务 ·

0

正则化项 · Extensibility · 类别 · 正则表达式 · 词元分析器 ·

2021 年 3 月 13 日

Grammars for Document Spanners

翻译：用于文件spananers 的语法

Liat Peterfreund

We propose a new grammar-based language for defining information-extractors from documents (text) that is built upon the well-studied framework of document spanners for extracting structured data from text. While previously studied formalisms for document spanners are mainly based on regular expressions, we use an extension of context-free grammars, called {extraction grammars}, to define the new class of context-free spanners. Extraction grammars are simply context-free grammars extended with variables that capture interval positions of the document, namely spans. While regular expressions are efficient for tokenizing and tagging, context-free grammars are also efficient for capturing structural properties. Indeed, we show that context-free spanners are strictly more expressive than their regular counterparts. We reason about the expressive power of our new class and present a pushdown-automata model that captures it. We show that extraction grammars can be evaluated with polynomial data complexity. Nevertheless, as the degree of the polynomial depends on the query, we present an enumeration algorithm for unambiguous extraction grammars that, after quintic preprocessing, outputs the results sequentially, without repetitions, with a constant delay between every two consecutive ones.

翻译：我们建议一种基于语法的新语言,用于定义文档(文本)的信息提取器。这种语言以经过仔细研究的文档显示器框架为基础,用于从文本中提取结构化数据。虽然以前研究过的文件显示器格式主要基于常规表达式,但我们使用无上下文语法的扩展,称为{extractaction gragramars},以定义新的无上下文穿行器类别。提取语法只是无上下文的语法扩展,变量可以捕捉文档的间隔位置,即跨度。虽然常规表达法对于象征和标记有效,但无上下文的语法对于捕捉结构属性也有效。事实上,我们显示无上下文的穿行器比正常的对口语法更加明确。我们说明了我们新阶级的外观力量,并展示了一种可以捕捉它的按下向下方图像模型。我们表明,提取语法可以用多数值的复杂性来评估。然而,多式语法的程度取决于查询和标记,而每次连续的顺序分析结果,我们提出一个连续的顺序分析,在连续的顺序分析后,我们提出一个不重复的顺序分析。

0

相关内容

正则化项

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【Manning新书】现代Java实战，592页pdf

【Manning新书】现代Java实战，592页pdf

专知会员服务

101+阅读 · 2020年5月22日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【NLP| 推荐文章】基于文本和知识库的语义搜索（Semantic search on text and knowledge bases）

专知会员服务

46+阅读 · 2019年11月24日

【ACL 2019 Tutorials】政治文本的计算性分析：沟通不同领域的研究成果（Computational Analysis of Political Texts: Bridging Research Efforts Across Communities），GoranGlavaš,Federico Nanni,Simone Paolo Ponzetto

【ACL 2019 Tutorials】政治文本的计算性分析：沟通不同领域的研究成果（Computational Analysis of Political Texts: Bridging Research Efforts Across Communities），GoranGlavaš,Federico Nanni,Simone Paolo Ponzetto

专知会员服务

10+阅读 · 2019年11月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

TensorFlow 2.0 学习资源汇总

TensorFlow 2.0 学习资源汇总

专知会员服务

67+阅读 · 2019年10月9日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

AttentionFlow: Visualising Influence in Networks of Time Series

Arxiv

9+阅读 · 2021年2月3日

Dynamic Neuro-Symbolic Knowledge Graph Construction for Zero-shot Commonsense Question Answering

Dynamic Neuro-Symbolic Knowledge Graph Construction for Zero-shot Commonsense Question Answering

Arxiv

6+阅读 · 2020年10月30日

Reverse Engineering Configurations of Neural Text Generation Models

Arxiv

5+阅读 · 2020年4月13日

Learning to Weight for Text Classification

Learning to Weight for Text Classification

Arxiv

8+阅读 · 2019年3月28日

Incremental Reading for Question Answering

Incremental Reading for Question Answering

Arxiv

5+阅读 · 2019年1月15日

From direct tagging to Tagging with sentences compression

From direct tagging to Tagging with sentences compression

Arxiv

6+阅读 · 2018年10月5日

Phrase-Indexed Question Answering: A New Challenge for Scalable Document Comprehension

Arxiv

3+阅读 · 2018年4月20日

DDRprog: A CLEVR Differentiable Dynamic Reasoning Programmer

Arxiv

5+阅读 · 2018年3月30日

Subset Labeled LDA for Large-Scale Multi-Label Classification

Arxiv

3+阅读 · 2017年9月16日

Zero-Shot Transfer Learning for Event Extraction

Arxiv

10+阅读 · 2017年7月4日

VIP会员

文章信息

相关主题

正则表达式

词元分析器

相关VIP内容

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【Manning新书】现代Java实战，592页pdf

【Manning新书】现代Java实战，592页pdf

专知会员服务

101+阅读 · 2020年5月22日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【NLP| 推荐文章】基于文本和知识库的语义搜索（Semantic search on text and knowledge bases）

专知会员服务

46+阅读 · 2019年11月24日

【ACL 2019 Tutorials】政治文本的计算性分析：沟通不同领域的研究成果（Computational Analysis of Political Texts: Bridging Research Efforts Across Communities），GoranGlavaš,Federico Nanni,Simone Paolo Ponzetto

【ACL 2019 Tutorials】政治文本的计算性分析：沟通不同领域的研究成果（Computational Analysis of Political Texts: Bridging Research Efforts Across Communities），GoranGlavaš,Federico Nanni,Simone Paolo Ponzetto

专知会员服务

10+阅读 · 2019年11月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

TensorFlow 2.0 学习资源汇总

TensorFlow 2.0 学习资源汇总

专知会员服务

67+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

人机协同时代的军事指挥控制演进

《英国智库：瓦解俄罗斯防空系统生产，夺回制空权》最新报告

《通过仿真与开源数据提升战略决策：机遇与局限》最新报告

《战术突击工具包：军队的“边缘”操作系统》报告

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

AttentionFlow: Visualising Influence in Networks of Time Series

Arxiv

9+阅读 · 2021年2月3日

Dynamic Neuro-Symbolic Knowledge Graph Construction for Zero-shot Commonsense Question Answering

Dynamic Neuro-Symbolic Knowledge Graph Construction for Zero-shot Commonsense Question Answering

Arxiv

6+阅读 · 2020年10月30日

Reverse Engineering Configurations of Neural Text Generation Models

Arxiv

5+阅读 · 2020年4月13日

Learning to Weight for Text Classification

Learning to Weight for Text Classification

Arxiv

8+阅读 · 2019年3月28日

Incremental Reading for Question Answering

Incremental Reading for Question Answering

Arxiv

5+阅读 · 2019年1月15日

From direct tagging to Tagging with sentences compression

From direct tagging to Tagging with sentences compression

Arxiv

6+阅读 · 2018年10月5日

Phrase-Indexed Question Answering: A New Challenge for Scalable Document Comprehension

Arxiv

3+阅读 · 2018年4月20日

DDRprog: A CLEVR Differentiable Dynamic Reasoning Programmer

Arxiv

5+阅读 · 2018年3月30日

Subset Labeled LDA for Large-Scale Multi-Label Classification

Arxiv

3+阅读 · 2017年9月16日

Zero-Shot Transfer Learning for Event Extraction

Arxiv

10+阅读 · 2017年7月4日

微信扫码咨询专知VIP会员