自动QA:从数据库到只有合成培训数据的 QA 语义分析器 (AutoQA: From Databases To QA Semantic Parsers With Only Synthetic Training Data) - 专知论文

会员服务 ·

0

模型评估 · 自动问答 · MoDELS · 训练数据 · state-of-the-art ·

2021 年 6 月 8 日

AutoQA: From Databases To QA Semantic Parsers With Only Synthetic Training Data

翻译：自动QA:从数据库到只有合成培训数据的 QA 语义分析器

Silei Xu,Sina J. Semnani,Giovanni Campagna,Monica S. Lam

from arxiv, To appear in EMNLP 2020

We propose AutoQA, a methodology and toolkit to generate semantic parsers that answer questions on databases, with no manual effort. Given a database schema and its data, AutoQA automatically generates a large set of high-quality questions for training that covers different database operations. It uses automatic paraphrasing combined with template-based parsing to find alternative expressions of an attribute in different parts of speech. It also uses a novel filtered auto-paraphraser to generate correct paraphrases of entire sentences. We apply AutoQA to the Schema2QA dataset and obtain an average logical form accuracy of 62.9% when tested on natural questions, which is only 6.4% lower than a model trained with expert natural language annotations and paraphrase data collected from crowdworkers. To demonstrate the generality of AutoQA, we also apply it to the Overnight dataset. AutoQA achieves 69.8% answer accuracy, 16.4% higher than the state-of-the-art zero-shot models and only 5.2% lower than the same model trained with human data.

翻译：我们建议AutoQA, 这是一种方法和工具包, 用于生成解析器, 解答数据库中的问题, 无需人工操作。根据数据库的系统图及其数据, AutoQA 自动生成一系列高质量的培训问题, 包括不同的数据库操作。它与基于模板的解析相结合, 并使用自动解析, 在不同部分的语音中找到属性的替代表达方式。它还使用新颖的过滤式自动解析器, 来生成整个句子的正确解说。我们在Schema2QA 数据集中应用 AutoQA, 在对自然问题进行测试时, 获得62.9%的平均逻辑格式准确度为62.9%, 这比用专家自然语言说明和从众工那里收集的解说数据培训的模型低6.4% 。为了显示 AutoQA 的一般性, 我们还将它应用到过夜数据集。自动QA 达到69.8% 的回答准确度, 比最新零发模型高16.4%, 仅比受人类数据培训的同一模型低5.2% 。

0

相关内容

模型评估

机器学习系统设计系统评估标准

【WSDM 2021】面向信息检索的预训练语言模型

专知会员服务

36+阅读 · 2020年11月29日

图解FixMatch的半监督学习，The Illustrated FixMatch for Semi-Supervised Learning

图解FixMatch的半监督学习，The Illustrated FixMatch for Semi-Supervised Learning

专知会员服务

26+阅读 · 2020年4月2日

【Google-Mila】你的GAN实际上是一个基于能量的模型，你应该使用鉴别器驱动的潜在采样，Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling

【Google-Mila】你的GAN实际上是一个基于能量的模型，你应该使用鉴别器驱动的潜在采样，Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling

专知会员服务

30+阅读 · 2020年3月28日

【Google AI-Luong】无标记数据学习, 83ppt, learning from Unlabeled Data

【Google AI-Luong】无标记数据学习, 83ppt, learning from Unlabeled Data

专知会员服务

90+阅读 · 2020年3月5日

【清华大学】知识增强的常识性故事生成预训练模型，A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation

【清华大学】知识增强的常识性故事生成预训练模型，A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation

专知会员服务

52+阅读 · 2020年1月20日

【NLP| 推荐文章】基于文本和知识库的语义搜索（Semantic search on text and knowledge bases）

专知会员服务

46+阅读 · 2019年11月24日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

33+阅读 · 2019年10月18日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

已删除

将门创投

9+阅读 · 2017年10月17日

揭开知识库问答KB-QA的面纱3·向量建模篇

揭开知识库问答KB-QA的面纱3·向量建模篇

PaperWeekly

8+阅读 · 2017年8月23日

揭开知识库问答KB-QA的面纱2·语义解析篇

揭开知识库问答KB-QA的面纱2·语义解析篇

PaperWeekly

4+阅读 · 2017年8月11日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

揭开知识库问答KB-QA的面纱1·简介篇

揭开知识库问答KB-QA的面纱1·简介篇

PaperWeekly

6+阅读 · 2017年8月3日

Logic-Consistency Text Generation from Semantic Parses

Arxiv

0+阅读 · 2021年8月2日

VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs

Arxiv

3+阅读 · 2021年1月29日

Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization

Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization

Arxiv

10+阅读 · 2019年9月15日

Open Domain Event Extraction Using Neural Latent Variable Models

Open Domain Event Extraction Using Neural Latent Variable Models

Arxiv

4+阅读 · 2019年6月17日

Span Based Open Information Extraction

Arxiv

3+阅读 · 2019年3月1日

Symbolic Priors for RNN-based Semantic Parsing

Symbolic Priors for RNN-based Semantic Parsing

Arxiv

3+阅读 · 2018年9月20日

Improving Information Extraction from Images with Learned Semantic Models

Improving Information Extraction from Images with Learned Semantic Models

Arxiv

6+阅读 · 2018年8月27日

QA4IE: A Question Answering based Framework for Information Extraction

Arxiv

4+阅读 · 2018年4月10日

Simple and Effective Semi-Supervised Question Answering

Arxiv

5+阅读 · 2018年4月2日

MilkQA: a Dataset of Consumer Questions for the Task of Answer Selection

Arxiv

4+阅读 · 2018年1月10日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

【WSDM 2021】面向信息检索的预训练语言模型

专知会员服务

36+阅读 · 2020年11月29日

图解FixMatch的半监督学习，The Illustrated FixMatch for Semi-Supervised Learning

图解FixMatch的半监督学习，The Illustrated FixMatch for Semi-Supervised Learning

专知会员服务

26+阅读 · 2020年4月2日

【Google-Mila】你的GAN实际上是一个基于能量的模型，你应该使用鉴别器驱动的潜在采样，Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling

【Google-Mila】你的GAN实际上是一个基于能量的模型，你应该使用鉴别器驱动的潜在采样，Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling

专知会员服务

30+阅读 · 2020年3月28日

【Google AI-Luong】无标记数据学习, 83ppt, learning from Unlabeled Data

【Google AI-Luong】无标记数据学习, 83ppt, learning from Unlabeled Data

专知会员服务

90+阅读 · 2020年3月5日

【清华大学】知识增强的常识性故事生成预训练模型，A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation

【清华大学】知识增强的常识性故事生成预训练模型，A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation

专知会员服务

52+阅读 · 2020年1月20日

【NLP| 推荐文章】基于文本和知识库的语义搜索（Semantic search on text and knowledge bases）

专知会员服务

46+阅读 · 2019年11月24日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

33+阅读 · 2019年10月18日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《步兵小单元山地严寒作战指南》美军最新条令200页

《联合作战概念的发展》最新报告

俄制无人机弹药

《复杂场景下自主着陆的模型预测控制技术》92页

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

已删除

将门创投

9+阅读 · 2017年10月17日

揭开知识库问答KB-QA的面纱3·向量建模篇

揭开知识库问答KB-QA的面纱3·向量建模篇

PaperWeekly

8+阅读 · 2017年8月23日

揭开知识库问答KB-QA的面纱2·语义解析篇

揭开知识库问答KB-QA的面纱2·语义解析篇

PaperWeekly

4+阅读 · 2017年8月11日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

揭开知识库问答KB-QA的面纱1·简介篇

揭开知识库问答KB-QA的面纱1·简介篇

PaperWeekly

6+阅读 · 2017年8月3日

相关论文

Logic-Consistency Text Generation from Semantic Parses

Arxiv

0+阅读 · 2021年8月2日

VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs

Arxiv

3+阅读 · 2021年1月29日

Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization

Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization

Arxiv

10+阅读 · 2019年9月15日

Open Domain Event Extraction Using Neural Latent Variable Models

Open Domain Event Extraction Using Neural Latent Variable Models

Arxiv

4+阅读 · 2019年6月17日

Span Based Open Information Extraction

Arxiv

3+阅读 · 2019年3月1日

Symbolic Priors for RNN-based Semantic Parsing

Symbolic Priors for RNN-based Semantic Parsing

Arxiv

3+阅读 · 2018年9月20日

Improving Information Extraction from Images with Learned Semantic Models

Improving Information Extraction from Images with Learned Semantic Models

Arxiv

6+阅读 · 2018年8月27日

QA4IE: A Question Answering based Framework for Information Extraction

Arxiv

4+阅读 · 2018年4月10日

Simple and Effective Semi-Supervised Question Answering

Arxiv

5+阅读 · 2018年4月2日

MilkQA: a Dataset of Consumer Questions for the Task of Answer Selection

Arxiv

4+阅读 · 2018年1月10日

微信扫码咨询专知VIP会员