In this paper, we propose a novel SQL guided pre-training framework STAR for context-dependent text-to-SQL parsing, which leverages contextual information to enrich natural language (NL) utterance and table schema representations for text-to-SQL conversations. Concretely, we propose two novel pre-training objectives which respectively explore the context-dependent interactions of NL utterances and SQL queries within each text-to-SQL conversation: (i) schema state tracking (SST) objective that tracks and explores the schema states of context-dependent SQL queries in the form of schema-states by predicting and updating the value of each schema slot during interaction; (ii) utterance dependency tracking (UDT) objective that employs weighted contrastive learning to pull together two semantically similar NL utterances and push away the representations of semantically dissimilar NL utterances within each conversation. In addition, we construct a high-quality large-scale context-dependent text-to-SQL conversation corpus to pre-train STAR. Extensive experiments show that STAR achieves new state-of-the-art performance on two downstream benchmarks (SParC and CoSQL), significantly outperforming previous pre-training methods and ranking first on the leaderboard. We believe the release of the constructed corpus, codebase and pre-trained STAR checkpoints would push forward the research in this area. For reproducibility, we release our code and data at https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/star.
翻译:在本文中,我们提出一个新的SQL指导培训前框架STRAR, 用于根据背景进行文本到SQL的评析, 利用背景信息来丰富自然语言(NL)的发音和文本到SQL对话的表格示意图。具体地说,我们提出两个新的培训前目标,分别探讨NL发音和SQL问询在每次文本到SQL对话中根据背景进行的互动:(一) Schema State track (SST) 目标,通过预测和更新互动期间每个Schema 时间段的价值,跟踪和探索基于背景的SQL查询的系统状态。具体地说,我们提出了两个新的培训前学习目标,将NL发音和SQL质调问在每次谈话中分别进行,跟踪和探索基于背景的查询,跟踪并探索以Sqrealal-L对话状态为形式进行,在Staria Streal-StarStar数据库中大幅展示Starial-deal-laxal Streal Streal Streal-destrabal Streal Streal Streal-sal-stal Stabidustrabal Stabidustral-Stargyal-Stardal Stabidudustral-Stardal)。