This work aims to tackle the challenging heterogeneous graph encoding problem in the text-to-SQL task. Previous methods are typically node-centric and merely utilize different weight matrices to parameterize edge types, which 1) ignore the rich semantics embedded in the topological structure of edges, and 2) fail to distinguish local and non-local relations for each node. To this end, we propose a Line Graph Enhanced Text-to-SQL (LGESQL) model to mine the underlying relational features without constructing meta-paths. By virtue of the line graph, messages propagate more efficiently through not only connections between nodes, but also the topology of directed edges. Furthermore, both local and non-local relations are integrated distinctively during the graph iteration. We also design an auxiliary task called graph pruning to improve the discriminative capability of the encoder. Our framework achieves state-of-the-art results (62.8% with Glove, 72.0% with Electra) on the cross-domain text-to-SQL benchmark Spider at the time of writing.
翻译:这项工作旨在解决文本到 SQL 任务中具有挑战性的多元图形编码问题。 以往的方法通常以节点为中心,只是使用不同重量矩阵来参数化边缘类型,这些方法 (1) 忽略边缘地形结构中所含的丰富的语义, 2 无法区分每个节点的本地和非本地关系。 为此, 我们提议了一个线形图强化文本到 SQL (LGESQL) 模型, 用来覆盖基本的关系特征, 但不构建元路径 。 通过线形图, 信息不仅通过节点之间的联系, 而且还通过定向边缘的地形来更高效地传播。 此外, 本地和非本地的关系在图形迭代期间都明显地融合在一起。 我们还设计了一个叫作图形剪切的辅助任务, 以提高编码器的歧视性能力。 我们的框架在写时在跨多域文本到 SQL 基准蜘蛛上取得了最新的结果( 62.8% 与 Glove, 72.0% 与 Lepla) 。