Addressing the mismatch between natural language descriptions and the corresponding SQL queries is a key challenge for text-to-SQL translation. To bridge this gap, we propose an SQL intermediate representation (IR) called Natural SQL (NatSQL). Specifically, NatSQL preserves the core functionalities of SQL, while it simplifies the queries as follows: (1) dispensing with operators and keywords such as GROUP BY, HAVING, FROM, JOIN ON, which are usually hard to find counterparts for in the text descriptions; (2) removing the need for nested subqueries and set operators; and (3) making schema linking easier by reducing the required number of schema items. On Spider, a challenging text-to-SQL benchmark that contains complex and nested SQL queries, we demonstrate that NatSQL outperforms other IRs, and significantly improves the performance of several previous SOTA models. Furthermore, for existing models that do not support executable SQL generation, NatSQL easily enables them to generate executable SQL queries, and achieves the new state-of-the-art execution accuracy.
翻译:解决自然语言描述与相应的 SQL 查询之间的不匹配是文本到 SQL 翻译的关键挑战。 为了缩小这一差距,我们建议使用一个名为自然 SQL (NatSQL) 的 SQL 中间代号(IR ) 。 具体地说, NatSQL 保存了 SQL 的核心功能,同时将询问简化如下:(1) 与操作员和关键词,如Group BY, Having, from, JOINON, 通常很难找到文本描述中的对应词;(2) 消除对嵌套子和设置操作员的需求;(3) 通过减少所需精密项目的数量,使 SQL 系统更加容易地连接。 在蜘蛛上,一个具有挑战性的文本到 SQL 基准, 包含复杂和嵌套的 SQL 查询, 我们证明NatSQL 超越了其他 IR, 大大改进了前几个STA模型的性能。此外, 对于不支持可执行 SQL 一代的现有模型, NatSQL QL AL 容易使其产生可执行的查询, 并实现新状态的精确性。