The task of text-to-SQL parsing, which aims at converting natural language questions into executable SQL queries, has garnered increasing attention in recent years, as it can assist end users in efficiently extracting vital information from databases without the need for technical background. One of the major challenges in text-to-SQL parsing is domain generalization, i.e., how to generalize well to unseen databases. Recently, the pre-trained text-to-text transformer model, namely T5, though not specialized for text-to-SQL parsing, has achieved state-of-the-art performance on standard benchmarks targeting domain generalization. In this work, we explore ways to further augment the pre-trained T5 model with specialized components for text-to-SQL parsing. Such components are expected to introduce structural inductive bias into text-to-SQL parsers thus improving model's capacity on (potentially multi-hop) reasoning, which is critical for generating structure-rich SQLs. To this end, we propose a new architecture GRAPHIX-T5, a mixed model with the standard pre-trained transformer model augmented by some specially-designed graph-aware layers. Extensive experiments and analysis demonstrate the effectiveness of GRAPHIX-T5 across four text-to-SQL benchmarks: SPIDER, SYN, REALISTIC and DK. GRAPHIX-T5 surpass all other T5-based parsers with a significant margin, achieving new state-of-the-art performance. Notably, GRAPHIX-T5-large reach performance superior to the original T5-large by 5.7% on exact match (EM) accuracy and 6.6% on execution accuracy (EX). This even outperforms the T5-3B by 1.2% on EM and 1.5% on EX.
翻译:文本到 SQL 剖析的任务是将自然语言问题转换成可执行的 SQL 查询的文本到 SQL 剖析任务,近年来这种任务日益引起人们的注意,因为它可以帮助终端用户在不需要技术背景的情况下高效地从数据库中提取重要信息。文本到 SQL 剖析的主要挑战之一是域化,即如何向隐蔽的数据库广泛推广。最近,经过预先训练的文本到文本的变换变换模型模式,即T5,虽然不是专门用于可执行的 SQL 查询,但近年来在标准基准中实现了最先进的性能,以域为主。在这项工作中,我们探索了经过事先训练的T5T5T5 模型,其中含有文本到 SQQL 的专用内容。这些组成部分预计将给文本到 SQQL 缩略图中引入结构性的偏向偏差,从而改进模型在(潜在的多速度) 模型上的能力,对于产生结构丰富的SQL 。为此,我们建议一个新的结构到GRIS 5, 通过GIS 高级的GIS- Streal- Streal- Streal Trealma- develop Teval ex ex ex ex ex ex ex ex ex ex ex ex fal ex ex fal ex exal ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex exformail exforal ex ex.