通过构件对齐衡量和改进文本到SQL的集成性 (Measuring and Improving Compositional Generalization in Text-to-SQL via Component Alignment) - 专知论文

会员服务 ·

0

泛化理论 · 数据集 · MoDELS · Performer · state-of-the-art ·

2022 年 5 月 4 日

Measuring and Improving Compositional Generalization in Text-to-SQL via Component Alignment

翻译：通过构件对齐衡量和改进文本到SQL的集成性

Yujian Gan,Xinyun Chen,Qiuping Huang,Matthew Purver

from arxiv, To appear in Findings of NAACL 2022

In text-to-SQL tasks -- as in much of NLP -- compositional generalization is a major challenge: neural networks struggle with compositional generalization where training and test distributions differ. However, most recent attempts to improve this are based on word-level synthetic data or specific dataset splits to generate compositional biases. In this work, we propose a clause-level compositional example generation method. We first split the sentences in the Spider text-to-SQL dataset into sub-sentences, annotating each sub-sentence with its corresponding SQL clause, resulting in a new dataset Spider-SS. We then construct a further dataset, Spider-CG, by composing Spider-SS sub-sentences in different combinations, to test the ability of models to generalize compositionally. Experiments show that existing models suffer significant performance degradation when evaluated on Spider-CG, even though every sub-sentence is seen during training. To deal with this problem, we modify a number of state-of-the-art models to train on the segmented data of Spider-SS, and we show that this method improves the generalization performance.

翻译：在文本到 SQL 任务中,如同在很多 NLP 中一样, 组成概括化是一个重大挑战: 神经网络与构成概括化斗争, 培训和测试分布不尽相同。然而, 最近试图改进这一点的尝试是以字级合成数据或特定数据集分割为基础, 以产生组成偏差。在这项工作中, 我们提出一个条款级的构成示例生成方法。我们首先将蜘蛛文本到 SQL 数据集的句子分为次句子句子, 用相应的 SQL 条款来说明, 从而产生一个新的数据集蜘蛛SS 。然后我们再构建一个数据集, 蜘蛛CG, 将蜘蛛SS 子句子组合成不同的组合, 以测试模型对组成偏差的能力。实验显示, 在对蜘蛛CG 进行评估时, 现有模型的性能严重退化, 尽管在培训过程中看到每个子句子。为了解决这个问题, 我们修改了一些最先进的模型, 来培训蜘蛛SS 的分解数据, 我们展示了这种方法的性能改进。

0

相关内容

泛化理论

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

245+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

BAG3与MACC1相互作用在甲状腺癌细胞上皮间质转化(EMT) 及侵袭中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

多任务学习的理论分析与应用

国家自然科学基金

6+阅读 · 2013年12月31日

Ghrelin抑制糖脂毒性诱导的心肌胰岛素抵抗的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

云计算环境下数据中心的power capping关键问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

Kelvin-Helmholtz不稳定性引起的太阳风到磁层能量通量传输数值模拟研究

国家自然科学基金

0+阅读 · 2011年12月31日

算子概率论中的算子论和算子代数问题

国家自然科学基金

0+阅读 · 2011年12月31日

分布参数系统控制理论及应用

国家自然科学基金

0+阅读 · 2011年12月31日

分布参数系统的H-无穷控制理论

国家自然科学基金

0+阅读 · 2009年12月31日

非对称式六自由度并联机器人机构奇异位形的研究

国家自然科学基金

0+阅读 · 2009年12月31日

无限维控制系统的数学理论

国家自然科学基金

0+阅读 · 2008年12月31日

Prototypical Contrastive Language Image Pretraining

Arxiv

0+阅读 · 2022年6月22日

Semantics-Depth-Symbiosis: Deeply Coupled Semi-Supervised Learning of Semantics and Depth

Semantics-Depth-Symbiosis: Deeply Coupled Semi-Supervised Learning of Semantics and Depth

Arxiv

0+阅读 · 2022年6月21日

KnowDA: All-in-One Knowledge Mixture Model for Data Augmentation in Few-Shot NLP

Arxiv

0+阅读 · 2022年6月21日

Causal Inference with Treatment Measurement Error: A Nonparametric Instrumental Variable Approach

Arxiv

0+阅读 · 2022年6月18日

Fully Privacy-Preserving Federated Representation Learning via Secure Embedding Aggregation

Arxiv

0+阅读 · 2022年6月18日

Using Transfer Learning for Code-Related Tasks

Arxiv

0+阅读 · 2022年6月17日

Which Knowledge Graph Is Best for Me?

Arxiv

11+阅读 · 2018年9月28日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

Zero-Shot Object Detection by Hybrid Region Embedding

Arxiv

19+阅读 · 2018年5月17日

Deep Metric Learning with BIER: Boosting Independent Embeddings Robustly

Arxiv

18+阅读 · 2018年1月15日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

245+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

小规模训练指南：打造世界级大语言模型的关键方法

无人机编队飞行：复杂环境中作战的策略、挑战与应用

大模型APP，AI时代第一个爆款

从数据中心视角出发的高效大语言模型训练综述

相关资讯

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

Prototypical Contrastive Language Image Pretraining

Arxiv

0+阅读 · 2022年6月22日

Semantics-Depth-Symbiosis: Deeply Coupled Semi-Supervised Learning of Semantics and Depth

Semantics-Depth-Symbiosis: Deeply Coupled Semi-Supervised Learning of Semantics and Depth

Arxiv

0+阅读 · 2022年6月21日

KnowDA: All-in-One Knowledge Mixture Model for Data Augmentation in Few-Shot NLP

Arxiv

0+阅读 · 2022年6月21日

Causal Inference with Treatment Measurement Error: A Nonparametric Instrumental Variable Approach

Arxiv

0+阅读 · 2022年6月18日

Fully Privacy-Preserving Federated Representation Learning via Secure Embedding Aggregation

Arxiv

0+阅读 · 2022年6月18日

Using Transfer Learning for Code-Related Tasks

Arxiv

0+阅读 · 2022年6月17日

Which Knowledge Graph Is Best for Me?

Arxiv

11+阅读 · 2018年9月28日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

Zero-Shot Object Detection by Hybrid Region Embedding

Arxiv

19+阅读 · 2018年5月17日

Deep Metric Learning with BIER: Boosting Independent Embeddings Robustly

Arxiv

18+阅读 · 2018年1月15日

相关基金

BAG3与MACC1相互作用在甲状腺癌细胞上皮间质转化(EMT) 及侵袭中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

多任务学习的理论分析与应用

国家自然科学基金

6+阅读 · 2013年12月31日

Ghrelin抑制糖脂毒性诱导的心肌胰岛素抵抗的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

云计算环境下数据中心的power capping关键问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

Kelvin-Helmholtz不稳定性引起的太阳风到磁层能量通量传输数值模拟研究

国家自然科学基金

0+阅读 · 2011年12月31日

算子概率论中的算子论和算子代数问题

国家自然科学基金

0+阅读 · 2011年12月31日

分布参数系统控制理论及应用

国家自然科学基金

0+阅读 · 2011年12月31日

分布参数系统的H-无穷控制理论

国家自然科学基金

0+阅读 · 2009年12月31日

非对称式六自由度并联机器人机构奇异位形的研究

国家自然科学基金

0+阅读 · 2009年12月31日

无限维控制系统的数学理论

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员