The importance of building text-to-SQL parsers which can be applied to new databases has long been acknowledged, and a critical step to achieve this goal is schema linking, i.e., properly recognizing mentions of unseen columns or tables when generating SQLs. In this work, we propose a novel framework to elicit relational structures from large-scale pre-trained language models (PLMs) via a probing procedure based on Poincar\'e distance metric, and use the induced relations to augment current graph-based parsers for better schema linking. Compared with commonly-used rule-based methods for schema linking, we found that probing relations can robustly capture semantic correspondences, even when surface forms of mentions and entities differ. Moreover, our probing procedure is entirely unsupervised and requires no additional parameters. Extensive experiments show that our framework sets new state-of-the-art performance on three benchmarks. We empirically verify that our probing procedure can indeed find desired relational structures through qualitative analysis. Our code can be found at https://github.com/AlibabaResearch/DAMO-ConvAI.
翻译:建立可用于新数据库的文本到 SQL 剖析器的重要性早已得到认可,而实现这一目标的关键步骤是系统连接,即适当地确认在生成 SQL 时提及了隐蔽的柱子或表格。 在这项工作中,我们提议了一个新框架,通过基于Poincar\'e距离测量的测试程序,从大规模预培训语言模型(PLMs)中引出关系结构,并利用诱导关系,扩大目前基于图形的剖析器,以建立更好的系统连接。与通常使用的基于规则的血计划连接方法相比,我们发现,即使表面的提及形式和实体不同,建立关系可以强有力地捕捉到语义通信。此外,我们的预测程序完全没有监督,不需要额外的参数。广泛的实验表明,我们的框架在三个基准上设置了新的最新状态性业绩。我们从经验上核实,我们的探测程序确实可以通过定性分析找到理想的关系结构。我们的代码可以在 https://githbub.com/AlibARIAARResearch。