The Text-to-SQL task, aiming to translate the natural language of the questions into SQL queries, has drawn much attention recently. One of the most challenging problems of Text-to-SQL is how to generalize the trained model to the unseen database schemas, also known as the cross-domain Text-to-SQL task. The key lies in the generalizability of (i) the encoding method to model the question and the database schema and (ii) the question-schema linking method to learn the mapping between words in the question and tables/columns in the database schema. Focusing on the above two key issues, we propose a Structure-Aware Dual Graph Aggregation Network (SADGA) for cross-domain Text-to-SQL. In SADGA, we adopt the graph structure to provide a unified encoding model for both the natural language question and database schema. Based on the proposed unified modeling, we further devise a structure-aware aggregation method to learn the mapping between the question-graph and schema-graph. The structure-aware aggregation method is featured with Global Graph Linking, Local Graph Linking, and Dual-Graph Aggregation Mechanism. We not only study the performance of our proposal empirically but also achieved 3rd place on the challenging Text-to-SQL benchmark Spider at the time of writing.
翻译:文本到 SQL 任务旨在将问题的自然语言转换成 SQL 查询,最近引起了人们的极大注意。文本到 SQL 最棘手的问题之一是如何将经过培训的模式推广到秘密数据库系统,也称为跨域文本到 SQL 任务。关键在于以下两个要素的通用性:(一) 用于模拟问题的编码方法和数据库系统图和(二) 将问题和(二) 将问题和数据库系统图的文字映射联系起来的方法问题-系统连接到数据库系统中的表格/栏目之间。以上述两个关键问题为重点,我们提议为跨域文本到 SQL 任务推广经过培训的模型模式。在SADGA中,我们采用图表结构结构,为自然语言问题和数据库系统图提供统一的编码模式。在拟议的统一模型的基础上,我们进一步设计了一种结构-认知汇总方法,以学习问题谱和系统图-系统图之间的绘图。侧重于上述两个关键问题,我们提出了结构-ADGAGA-AGAR 结构-R 数据库的系统化模型,我们只是将结构-AGRA 的进度图-IRC-IRC-IG-IB-IG-IAR 实现的进度分析方法。