质子: 用于文本到 SQL 剖析的预培训语言模型的试样表连接信息 (Proton: Probing Schema Linking Information from Pre-trained Language Models for Text-to-SQL Parsing)

The importance of building text-to-SQL parsers which can be applied to new databases has long been acknowledged, and a critical step to achieve this goal is schema linking, i.e., properly recognizing mentions of unseen columns or tables when generating SQLs. In this work, we propose a novel framework to elicit relational structures from large-scale pre-trained language models (PLMs) via a probing procedure based on Poincar\'e distance metric, and use the induced relations to augment current graph-based parsers for better schema linking. Compared with commonly-used rule-based methods for schema linking, we found that probing relations can robustly capture semantic correspondences, even when surface forms of mentions and entities differ. Moreover, our probing procedure is entirely unsupervised and requires no additional parameters. Extensive experiments show that our framework sets new state-of-the-art performance on three benchmarks. We empirically verify that our probing procedure can indeed find desired relational structures through qualitative analysis.

翻译：建立可用于新数据库的文本到SQL解析器的重要性早已得到确认,而实现这一目标的关键步骤是系统联系,即适当地确认在生成SQL时提及了不可见的柱子或表格。在这项工作中,我们提议了一个新框架,通过基于Poincar\'e距离测量的测试程序,从大规模预培训语言模型(PLMs)中引出关系结构,并利用诱导关系扩大目前基于图形的解析器,以建立更好的系统联系。与通常使用的基于规则的系统联系方法相比,我们发现,即使表面形式的提及和实体不同,验证关系也可以强有力地捕捉到语义通信。此外,我们的检验程序完全没有监督,不需要额外的参数。广泛的实验表明,我们的框架在三个基准上设置了新的最新状态。我们从经验上核实,我们的验证程序确实可以通过定性分析找到理想的关系结构。