Recent neural text-to-SQL models can effectively translate natural language questions to corresponding SQL queries on unseen databases. Working mostly on the Spider dataset, researchers have proposed increasingly sophisticated solutions to the problem. Contrary to this trend, in this paper we focus on simplifications. We begin by building DuoRAT, a re-implementation of the state-of-the-art RAT-SQL model that unlike RAT-SQL is using only relation-aware or vanilla transformers as the building blocks. We perform several ablation experiments using DuoRAT as the baseline model. Our experiments confirm the usefulness of some techniques and point out the redundancy of others, including structural SQL features and features that link the question with the schema.
翻译:最近的神经文本到 SQL 模型可以有效地将自然语言问题转化为对未知数据库的相应 SQL 查询。 研究人员主要在蜘蛛数据集上提出越来越复杂的解决问题的办法。 与这一趋势相反,我们在本文件中侧重于简化。 我们首先建设Duorat,这是与RAT-SQL 不同的最新RAT-SQL 模型的重新应用,它与RAT-SQL 模型不同,它只使用有关系或香草变压器作为构件。 我们用Duorat 作为基线模型,进行了数项通缩实验。 我们的实验证实了某些技术的有用性,并指出了其他技术的冗余,包括SQL 结构特征和将问题与Schema联系起来的特征。