The availability of digital devices operated by voice is expanding rapidly. However, the applications of voice interfaces are still restricted. For example, speaking in public places becomes an annoyance to the surrounding people, and secret information should not be uttered. Environmental noise may reduce the accuracy of speech recognition. To address these limitations, a system to detect a user's unvoiced utterance is proposed. From internal information observed by an ultrasonic imaging sensor attached to the underside of the jaw, our proposed system recognizes the utterance contents without the user's uttering voice. Our proposed deep neural network model is used to obtain acoustic features from a sequence of ultrasound images. We confirmed that audio signals generated by our system can control the existing smart speakers. We also observed that a user can adjust their oral movement to learn and improve the accuracy of their voice recognition.
翻译:以声音操作的数字装置的可用性正在迅速扩大,然而,语音界面的应用仍然受到限制。例如,在公共场所讲话对周围的人来说是一种烦恼,不应透露秘密信息。环境噪音可能降低语音识别的准确性。为解决这些限制,提议建立检测用户无声表达的系统。从连接下巴的超声成像传感器所观测的内部信息来看,我们提议的系统在用户没有发声的情况下就承认了声音内容。我们提议的深神经网络模型用于从超声波图像序列中获取音频特征。我们确认,我们系统产生的音频信号可以控制现有的智能发言者。我们还观察到,用户可以调整其口述动作,学习并提高其语音识别的准确性。</s>