数据自然语言界面 (Natural Language Interfaces to Data)

from arxiv, The full version of this manuscript, as published by Foundations and Trends in Databases, is available at http://dx.doi.org/10.1561/1900000078

Recent advances in NLU and NLP have resulted in renewed interest in natural language interfaces to data, which provide an easy mechanism for non-technical users to access and query the data. While early systems evolved from keyword search and focused on simple factual queries, the complexity of both the input sentences as well as the generated SQL queries has evolved over time. More recently, there has also been a lot of focus on using conversational interfaces for data analytics, empowering a line of non-technical users with quick insights into the data. There are three main challenges in natural language querying (NLQ): (1) identifying the entities involved in the user utterance, (2) connecting the different entities in a meaningful way over the underlying data source to interpret user intents, and (3) generating a structured query in the form of SQL or SPARQL. There are two main approaches for interpreting a user's NLQ. Rule-based systems make use of semantic indices, ontologies, and KGs to identify the entities in the query, understand the intended relationships between those entities, and utilize grammars to generate the target queries. With the advances in deep learning (DL)-based language models, there have been many text-to-SQL approaches that try to interpret the query holistically using DL models. Hybrid approaches that utilize both rule-based techniques as well as DL models are also emerging by combining the strengths of both approaches. Conversational interfaces are the next natural step to one-shot NLQ by exploiting query context between multiple turns of conversation for disambiguation. In this article, we review the background technologies that are used in natural language interfaces, and survey the different approaches to NLQ. We also describe conversational interfaces for data analytics and discuss several benchmarks used for NLQ research and evaluation.

翻译：NLU 和 NLP 的最新进展使人们对自然语言数据界面重新产生兴趣,这为非技术用户访问和查询数据提供了一个简易机制。早期系统从关键字搜索演变而来,侧重于简单的事实查询,但输入句和生成的 SQL 查询的复杂性随着时间而变化。最近,还大量关注数据分析使用对口界面,赋予非技术用户对数据的快速洞察力。自然语言查询(NLQ)有三大挑战:(1) 确定参与用户表达的实体,(2) 以有意义的方式将不同实体连接到基本数据源上,以解释用户意图,而输入输入的 SQL 和生成的 SQL 查询。在解读用户的 NL Q 时,使用基于规则的系统使用语义指数,使用基于规则的Dlog和KGs 来识别查询中的实体,利用这些实体之间的预定关系,利用语义背景将不同的实体连接起来,使用一个语系的语系界面来生成目标查询。在深入的L 数据调查中,我们使用的自然判读了两种语言的进。我们用来使用两种语言的语系。在深层次调查中使用了一种语言的语系。我们用于深层次调查中所使用的语言的语系。在使用两种语言的读进进进。我们所使用的语言的语系 Q 。我们使用两种语言。在使用两种语言的读进进。