跨域文本到 SQL 剖析用等级 SQL 到问题生成数据增强 (Data Augmentation with Hierarchical SQL-to-Question Generation for Cross-domain Text-to-SQL Parsing) - 专知论文

会员服务 ·

0

数据增强 · SQL · 生成模型 · Performer · Better ·

2021 年 3 月 8 日

Data Augmentation with Hierarchical SQL-to-Question Generation for Cross-domain Text-to-SQL Parsing

翻译：跨域文本到 SQL 剖析用等级 SQL 到问题生成数据增强

Ao Zhang,Kun Wu,Lijie Wang,Zhenghua Li,Xinyan Xiao,Hua Wu,Min Zhang,Haifeng Wang

Data augmentation has attracted a lot of research attention in the deep learning era for its ability in alleviating data sparseness. The lack of data for unseen evaluation databases is exactly the major challenge for cross-domain text-to-SQL parsing. Previous works either require human intervention to guarantee the quality of generated data, or fail to handle complex SQL queries. This paper presents a simple yet effective data augmentation framework. First, given a database, we automatically produce a large amount of SQL queries based on an abstract syntax tree grammar. We require the generated queries cover at least 80% of SQL patterns in the training data for better distribution matching. Second, we propose a hierarchical SQL-to-question generation model to obtain high-quality natural language questions, which is the major contribution of this work. Experiments on three cross-domain datasets, i.e., WikiSQL and Spider in English, and DuSQL in Chinese, show that our proposed data augmentation framework can consistently improve performance over strong baselines, and in particular the hierarchical generation model is the key for the improvement.

翻译：在深层次的学习时代,数据扩增因其在减轻数据稀少方面的能力而吸引了大量的研究关注。缺乏用于隐性评估数据库的数据正是跨域文本到 SQL 解析的主要挑战。以前的工作要么需要人手干预来保证生成数据的质量,要么无法处理复杂的 SQL 查询。本文提出了一个简单而有效的数据扩增框架。首先, 根据一个数据库, 我们自动产生大量基于抽象语法树语法的 SQL 查询。我们要求生成的查询至少涵盖培训数据中至少80%的 SQL 模式, 以便更好地匹配分布。其次, 我们提出一个等级性 SQL 到问题生成模型, 以获得高质量的自然语言问题, 这是这项工作的主要贡献。在三个跨域数据集上进行的实验, 即英语的WikisQL 和蜘蛛以及中文的 DusQL, 表明我们提出的数据扩增框架可以不断提高强基线的性能, 特别是等级生成模型是改进的关键。

0

相关内容

数据增强

数据增强在机器学习领域多指采用一些方法（比如数据蒸馏，正负样本均衡等）来提高模型数据集的质量，增强数据。

【经典书】线性代数，436页pdf

专知会员服务

77+阅读 · 2021年3月16日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【ACL2020】用于生成深度问题的语义图，Semantic Graphs for Generating Deep Questions

【ACL2020】用于生成深度问题的语义图，Semantic Graphs for Generating Deep Questions

专知会员服务

26+阅读 · 2020年5月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

机器学习速查手册，135页pdf

机器学习速查手册，135页pdf

专知会员服务

342+阅读 · 2020年3月15日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【AAAI2020】知识图谱的生成式对抗零样本关系学习，Generative Adversarial Zero-Shot Relational Learning for Knowledge Graphs

【AAAI2020】知识图谱的生成式对抗零样本关系学习，Generative Adversarial Zero-Shot Relational Learning for Knowledge Graphs

专知会员服务

64+阅读 · 2020年1月11日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新六篇自动问答相关论文—排序函数、文本摘要评估、信息抽取框架、层次递归编码器、半监督问答

【论文推荐】最新六篇自动问答相关论文—排序函数、文本摘要评估、信息抽取框架、层次递归编码器、半监督问答

专知

9+阅读 · 2018年5月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Generative Adversarial Text to Image Synthesis论文解读

Generative Adversarial Text to Image Synthesis论文解读

统计学习与视觉计算组

13+阅读 · 2017年6月9日

Cross-Domain Few-Shot Classification via Adversarial Task Augmentation

Cross-Domain Few-Shot Classification via Adversarial Task Augmentation

Arxiv

0+阅读 · 2021年4月29日

Data Augmentation for Graph Neural Networks

Arxiv

38+阅读 · 2020年12月2日

Hierarchical Human Parsing with Typed Part-Relation Reasoning

Hierarchical Human Parsing with Typed Part-Relation Reasoning

Arxiv

6+阅读 · 2020年3月10日

Few-shot Natural Language Generation for Task-Oriented Dialog

Few-shot Natural Language Generation for Task-Oriented Dialog

Arxiv

30+阅读 · 2020年2月27日

A Survey and Taxonomy of Adversarial Neural Networks for Text-to-Image Synthesis

A Survey and Taxonomy of Adversarial Neural Networks for Text-to-Image Synthesis

Arxiv

7+阅读 · 2019年10月21日

A Sketch-Based System for Semantic Parsing

A Sketch-Based System for Semantic Parsing

Arxiv

4+阅读 · 2019年9月12日

Pixel Level Data Augmentation for Semantic Image Segmentation using Generative Adversarial Networks

Pixel Level Data Augmentation for Semantic Image Segmentation using Generative Adversarial Networks

Arxiv

5+阅读 · 2019年2月8日

Hierarchical Generative Modeling for Controllable Speech Synthesis

Hierarchical Generative Modeling for Controllable Speech Synthesis

Arxiv

3+阅读 · 2018年12月27日

Bidirectional Attention for SQL Generation

Bidirectional Attention for SQL Generation

Arxiv

4+阅读 · 2018年6月21日

Deep Semantic Hashing with Generative Adversarial Networks

Arxiv

5+阅读 · 2018年4月23日

VIP会员

文章信息

相关主题

相关VIP内容

【经典书】线性代数，436页pdf

专知会员服务

77+阅读 · 2021年3月16日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【ACL2020】用于生成深度问题的语义图，Semantic Graphs for Generating Deep Questions

【ACL2020】用于生成深度问题的语义图，Semantic Graphs for Generating Deep Questions

专知会员服务

26+阅读 · 2020年5月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

机器学习速查手册，135页pdf

机器学习速查手册，135页pdf

专知会员服务

342+阅读 · 2020年3月15日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【AAAI2020】知识图谱的生成式对抗零样本关系学习，Generative Adversarial Zero-Shot Relational Learning for Knowledge Graphs

【AAAI2020】知识图谱的生成式对抗零样本关系学习，Generative Adversarial Zero-Shot Relational Learning for Knowledge Graphs

专知会员服务

64+阅读 · 2020年1月11日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

热门VIP内容

开通专知VIP会员享更多权益服务

《美陆军特种作战条令》最新102页

《洛克希德SR-71“黑鸟”侦察机动力系统》21页slides

美空军作战实验室通过人工智能和指挥控制技术创新推进杀伤链

《指挥控制能力分析方法论》最新报告

相关资讯

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新六篇自动问答相关论文—排序函数、文本摘要评估、信息抽取框架、层次递归编码器、半监督问答

【论文推荐】最新六篇自动问答相关论文—排序函数、文本摘要评估、信息抽取框架、层次递归编码器、半监督问答

专知

9+阅读 · 2018年5月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Generative Adversarial Text to Image Synthesis论文解读

Generative Adversarial Text to Image Synthesis论文解读

统计学习与视觉计算组

13+阅读 · 2017年6月9日

相关论文

Cross-Domain Few-Shot Classification via Adversarial Task Augmentation

Cross-Domain Few-Shot Classification via Adversarial Task Augmentation

Arxiv

0+阅读 · 2021年4月29日

Data Augmentation for Graph Neural Networks

Arxiv

38+阅读 · 2020年12月2日

Hierarchical Human Parsing with Typed Part-Relation Reasoning

Hierarchical Human Parsing with Typed Part-Relation Reasoning

Arxiv

6+阅读 · 2020年3月10日

Few-shot Natural Language Generation for Task-Oriented Dialog

Few-shot Natural Language Generation for Task-Oriented Dialog

Arxiv

30+阅读 · 2020年2月27日

A Survey and Taxonomy of Adversarial Neural Networks for Text-to-Image Synthesis

A Survey and Taxonomy of Adversarial Neural Networks for Text-to-Image Synthesis

Arxiv

7+阅读 · 2019年10月21日

A Sketch-Based System for Semantic Parsing

A Sketch-Based System for Semantic Parsing

Arxiv

4+阅读 · 2019年9月12日

Pixel Level Data Augmentation for Semantic Image Segmentation using Generative Adversarial Networks

Pixel Level Data Augmentation for Semantic Image Segmentation using Generative Adversarial Networks

Arxiv

5+阅读 · 2019年2月8日

Hierarchical Generative Modeling for Controllable Speech Synthesis

Hierarchical Generative Modeling for Controllable Speech Synthesis

Arxiv

3+阅读 · 2018年12月27日

Bidirectional Attention for SQL Generation

Bidirectional Attention for SQL Generation

Arxiv

4+阅读 · 2018年6月21日

Deep Semantic Hashing with Generative Adversarial Networks

Arxiv

5+阅读 · 2018年4月23日

微信扫码咨询专知VIP会员