One of the recent best attempts at Text-to-SQL is the pre-trained language model. Due to the structural property of the SQL queries, the seq2seq model takes the responsibility of parsing both the schema items (i.e., tables and columns) and the skeleton (i.e., SQL keywords). Such coupled targets increase the difficulty of parsing the correct SQL queries especially when they involve many schema items and logic operators. This paper proposes a ranking-enhanced encoding and skeleton-aware decoding framework to decouple the schema linking and the skeleton parsing. Specifically, for a seq2seq encoder-decode model, its encoder is injected by the most relevant schema items instead of the whole unordered ones, which could alleviate the schema linking effort during SQL parsing, and its decoder first generates the skeleton and then the actual SQL query, which could implicitly constrain the SQL parsing. We evaluate our proposed framework on Spider and its three robustness variants: Spider-DK, Spider-Syn, and Spider-Realistic. The experimental results show that our framework delivers promising performance and robustness. Our code is available at https://github.com/RUCKBReasoning/RESDSQL.
翻译:最近文本到SQL领域最佳工作之一是预训练语言模型。由于SQL查询的结构属性,seq2seq模型负责解析架构项(即表和列)和骨架(即SQL关键字)。这些耦合目标增加了解析正确SQL查询的难度,特别是当它们涉及到许多架构项和逻辑运算符时。本文提出了一个基于排名的编码和骨架感知解码框架,用于拆分模式链接和骨架解析。具体而言,对于seq2seq编码器-解码器模型,其编码器注入最相关的模式项而非整个无序模式项,这可以减轻SQL分析过程中的模式链接工作,并且其解码器首先生成骨架,然后再生成实际的SQL查询,这可以隐含地约束SQL解析。我们在Spider以及其三个稳健性变体(Spider-DK,Spider-Syn和Spider-Realistic)上评估了我们提出的框架。实验结果表明,我们的框架提供了有希望的性能和稳健性。我们的代码可以通过 https://github.com/RUCKBReasoning/RESDSQL 获取。