Interactions based on automatic speech recognition (ASR) have become widely used, with speech input being increasingly utilized to create documents. However, as there is no easy way to distinguish between commands being issued and text required to be input in speech, misrecognitions are difficult to identify and correct, meaning that documents need to be manually edited and corrected. The input of symbols and commands is also challenging because these may be misrecognized as text letters. To address these problems, this study proposes a speech interaction method called DualVoice, by which commands can be input in a whispered voice and letters in a normal voice. The proposed method does not require any specialized hardware other than a regular microphone, enabling a complete hands-free interaction. The method can be used in a wide range of situations where speech recognition is already available, ranging from text input to mobile/wearable computing. Two neural networks were designed in this study, one for discriminating normal speech from whispered speech, and the second for recognizing whisper speech. A prototype of a text input system was then developed to show how normal and whispered voice can be used in speech text input. Other potential applications using DualVoice are also discussed.
翻译:基于自动语音识别(ASR)的互动已经得到广泛使用,语音输入被越来越多地用于创建文档。然而,由于对发布命令和需要在语音中输入的文本没有简单的区分方法,错误识别很难识别和纠正,这意味着文件需要手工编辑和校正。符号和命令的输入也具有挑战性,因为这些输入可能被错误识别为文字字母。为了解决这些问题,本研究报告建议了一种语音互动方法,称为DialiveVoice, 命令可以以普通声音和字母输入。拟议方法不需要普通麦克风以外的任何专用硬件,允许完全无手互动。该方法可用于已经存在语音识别的多种情况,从文本输入到移动/湿度计算。本研究报告设计了两个神经网络,一个用于区分普通语音与耳语,第二个用于识别耳语。随后开发了一个文本输入系统的原型,以显示语音是如何正常和耳语用于语音输入语音的。还讨论了使用DaluVoice的其他潜在应用程序。