Translating Natural Language Queries (NLQs) to Structured Query Language (SQL) in interfaces deployed in relational databases is a challenging task, which has been widely studied in database community recently. Conventional rule based systems utilize series of solutions as a pipeline to deal with each step of this task, namely stop word filtering, tokenization, stemming/lemmatization, parsing, tagging, and translation. Recent works have mostly focused on the translation step overlooking the earlier steps by using ad-hoc solutions. In the pipeline, one of the most critical and challenging problems is keyword mapping; constructing a mapping between tokens in the query and relational database elements (tables, attributes, values, etc.). We define the keyword mapping problem as a sequence tagging problem, and propose a novel deep learning based supervised approach that utilizes POS tags of NLQs. Our proposed approach, called \textit{DBTagger} (DataBase Tagger), is an end-to-end and schema independent solution, which makes it practical for various relational databases. We evaluate our approach on eight different datasets, and report new state-of-the-art accuracy results, $92.4\%$ on the average. Our results also indicate that DBTagger is faster than its counterparts up to $10000$ times and scalable for bigger databases.
翻译:将自然语言查询( NLQ QQs ) 转换为在关系数据库中部署的界面结构查询( SQL ) 是一项艰巨的任务,最近已在数据库群中进行了广泛研究。 常规的基于规则的系统使用一系列解决方案作为管道处理这项任务的每一个步骤, 即停止字过滤、 标记化、 禁止/ 取消、 区分、 标记和翻译。 最近的工作主要侧重于通过使用临时解决方案来忽略早期步骤的翻译步骤。 在管道中, 最关键和最具挑战性的问题之一是关键词绘图; 在查询和关系数据库元素( 表格、 属性、 值等) 之间绘制标语图。 我们把关键字映射问题定义为一个序列标记问题, 并提议一种新型的深层次学习方法, 利用NLQs 的 POS 标记。 我们的拟议方法, 叫做 textitit{ DBTagger} (Database Taggle), 是最终和计划独立的解决方案之一, 使各种关系数据库( 日期) 更实际化的代号数据库( tal- b) ) 。