The task of text-to-SQL is to convert a natural language question to its corresponding SQL query in the context of relational tables. Existing text-to-SQL parsers generate a "plausible" SQL query for an arbitrary user question, thereby failing to correctly handle problematic user questions. To formalize this problem, we conduct a preliminary study on the observed ambiguous and unanswerable cases in text-to-SQL and summarize them into 6 feature categories. Correspondingly, we identify the causes behind each category and propose requirements for handling ambiguous and unanswerable questions. Following this study, we propose a simple yet effective counterfactual example generation approach for the automatic generation of ambiguous and unanswerable text-to-SQL examples. Furthermore, we propose a weakly supervised model DTE (Detecting-Then-Explaining) for error detection, localization, and explanation. Experimental results show that our model achieves the best result on both real-world examples and generated examples compared with various baselines. We will release data and code for future research.
翻译:文本到 SQL 的任务是将自然语言问题转换成相关表格中相应的 SQL 查询。现有的文本到 SQL 解析器生成了任意用户问题的“可变” SQL 查询,从而无法正确处理有问题的用户问题。为了将这一问题正式化,我们对在文本到SQL 中观察到的模糊和无法回答的案例进行初步研究,并将其归纳为6个特征类别。相应的是,我们确定每个类别背后的原因,并提出处理模糊和无法回答问题的要求。在这项研究之后,我们为自动生成模糊和无法解析的文本到 SQL 示例提出了一个简单而有效的反事实实例生成方法。此外,我们提出了一个薄弱的监管模型DTE(检测-当时的解析),用于检测错误、本地化和解释。实验结果显示,我们的模型在真实世界范例和生成的示例中取得了最佳结果,并与各种基线相比较。我们将发布未来研究的数据和代码。