When translating natural language questions into SQL queries to answer questions from a database, contemporary semantic parsing models struggle to generalize to unseen database schemas. The generalization challenge lies in (a) encoding the database relations in an accessible way for the semantic parser, and (b) modeling alignment between database columns and their mentions in a given query. We present a unified framework, based on the relation-aware self-attention mechanism, to address schema encoding, schema linking, and feature representation within a text-to-SQL encoder. On the challenging Spider dataset this framework boosts the exact match accuracy to 57.2%, surpassing its best counterparts by 8.7% absolute improvement. Further augmented with BERT, it achieves the new state-of-the-art performance of 65.6% on the Spider leaderboard. In addition, we observe qualitative improvements in the model's understanding of schema linking and alignment. Our implementation will be open-sourced at https://github.com/Microsoft/rat-sql.
翻译:将自然语言问题转换成 SQL 查询时, 当代语义解析模型很难将普通化为无形数据库图案。 总体化的挑战在于 (a) 为语义解析器以无障碍的方式将数据库关系编码成可访问的方式为语义解析器编码, (b) 将数据库列及其在给定查询中提及的内容建模一致。 我们根据自我注意关系机制提出了一个统一框架, 以解决文本到 SQL 编码、 schema 连接和特征在文本到 SQL 编码中的代表性。 在具有挑战性的蜘蛛数据集中, 这个框架将精确匹配性提高到57.2%, 超过最佳对应方的8.7% 绝对改进率。 通过 BERT 进一步扩展, 它在蜘蛛头板上达到65.6%的新的最新性能。 此外, 我们观察模型对 schem 连接和校正校准的理解质量上的改进。 我们的实施将在 https://github. com/ Microsoft/ rat- sql 上公开。