Speech-based inputs have been gaining significant momentum with the popularity of smartphones and tablets in our daily lives, since voice is the most easiest and efficient way for human-computer interaction. This paper works towards designing more effective speech-based interfaces to query the structured data in relational databases. We first identify a new task named Speech-to-SQL, which aims to understand the information conveyed by human speech and directly translate it into structured query language (SQL) statements. A naive solution to this problem can work in a cascaded manner, that is, an automatic speech recognition (ASR) component followed by a text-to-SQL component. However, it requires a high-quality ASR system and also suffers from the error compounding problem between the two components, resulting in limited performance. To handle these challenges, we further propose a novel end-to-end neural architecture named SpeechSQLNet to directly translate human speech into SQL queries without an external ASR step. SpeechSQLNet has the advantage of making full use of the rich linguistic information presented in speech. To the best of our knowledge, this is the first attempt to directly synthesize SQL based on arbitrary natural language questions, rather than a natural language-based version of SQL or its variants with a limited SQL grammar. To validate the effectiveness of the proposed problem and model, we further construct a dataset named SpeechQL, by piggybacking the widely-used text-to-SQL datasets. Extensive experimental evaluations on this dataset show that SpeechSQLNet can directly synthesize high-quality SQL queries from human speech, outperforming various competitive counterparts as well as the cascaded methods in terms of exact match accuracies.
翻译:由于智能手机和平板电脑在我们的日常生活中受到欢迎,以语音为基础的投入正在获得显著的势头,因为智能手机和平板电脑在我们的日常生活中受到欢迎,因为声音是人类-计算机互动的最简单、最高效的方式。本文致力于设计更有效的语音界面,以查询关系数据库中的结构性数据。我们首先确定一个新的任务,即Speople-SQL,目的是理解通过人类语言传递的信息,直接将其翻译成结构化的查询语言(SQL) 。解决这个问题的一个天真的解决方案可以以渐进的方式发挥作用,即一个自动语音识别(ASR)部分,然后是文本到SQL的文本。然而,它需要高质量的ASR系统,并且由于两个部分之间的错误使问题更加复杂化。为了应付这些挑战,我们进一步提议一个名为SSQL网络的端端点-端点结构,直接将人类语言的端点转换成SQL查询,通过我们最优秀的语音QQ,这比我们最优秀的SQRalal-L语言的直径直译,是直译SL的Sal-L的S-alalal-al-al-al-al Q-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-