The generalizability to new databases is of vital importance to Text-to-SQL systems which aim to parse human utterances into SQL statements. Existing works achieve this goal by leveraging the exact matching method to identify the lexical matching between the question words and the schema items. However, these methods fail in other challenging scenarios, such as the synonym substitution in which the surface form differs between the corresponding question words and schema items. In this paper, we propose a framework named ISESL-SQL to iteratively build a semantic enhanced schema-linking graph between question tokens and database schemas. First, we extract a schema linking graph from PLMs through a probing procedure in an unsupervised manner. Then the schema linking graph is further optimized during the training process through a deep graph learning method. Meanwhile, we also design an auxiliary task called graph regularization to improve the schema information mentioned in the schema-linking graph. Extensive experiments on three benchmarks demonstrate that ISESL-SQL could consistently outperform the baselines and further investigations show its generalizability and robustness.
翻译:新数据库的通用性对于文本到SQL系统至关重要,这些系统的目的是将人类的言词解析成 SQL 语句。 现有工作通过利用精确匹配方法来利用精确匹配方法来识别问题单词和系统图项之间的逻辑匹配。 但是,这些方法在其他具有挑战性的情景中都失败,例如,在同义词替代中,表面形式在相应的问题单词和系统图项之间有差异。 在本文件中,我们提议了一个名为 ISESSL-SQL 的框架,以迭接方式在问题符号和数据库图案之间建立一个经强化的语义链接图。 首先,我们以不受监督的方式从PLMS中提取一个图案式连接图案,然后,在培训过程中通过深图解学习方法进一步优化Schemma 连接图案。 同时,我们还设计了一个称为图解正规化的辅助任务,以改进Sche图式链接图案中所提到的系统图案信息。 广泛实验表明, ISSL QL 3个基准可以持续地超越基线,进一步的调查显示其总体性。