Relational databases are among the most widely used architectures to store massive amounts of data in the modern world. However, there is a barrier between these databases and the average user. The user often lacks the knowledge of a query language such as SQL required to interact with the database. The NL2SQL task aims at finding deep learning approaches to solve this problem by converting natural language questions into valid SQL queries. Given the sensitive nature of some databases and the growing need for data privacy, we have presented an approach with data privacy at its core. We have passed RoBERTa embeddings and data-agnostic knowledge vectors into LSTM based submodels to predict the final query. Although we have not achieved state of the art results, we have eliminated the need for the table data, right from the training of the model, and have achieved a test set execution accuracy of 76.7%. By eliminating the table data dependency while training we have created a model capable of zero shot learning based on the natural language question and table schema alone.
翻译:关系数据库是现代世界中存储大量数据的最广泛使用的架构之一,然而,这些数据库与普通用户之间有一个障碍。用户往往缺乏与数据库互动所需的SQL等查询语言的知识。NL2SQL的任务是寻找深层次的学习方法,通过将自然语言问题转换成有效的 SQL 查询来解决这一问题。鉴于某些数据库的敏感性和对数据隐私的日益需要,我们提出了一个将数据隐私放在核心位置的方法。我们已经通过RoBERTA嵌入LSTM的子模型和数据识别知识矢量来预测最终查询。虽然我们没有达到最新结果,但我们已经从模型培训中消除了对表格数据的需求,实现了76.7%的测试执行精确度。我们通过培训创造了一个能够根据自然语言问题和表格计划进行零镜头学习的模型。