利用进化神经网络对以超声波为基础的超声音频接口进行语音活动探测 (Voice Activity Detection for Ultrasound-based Silent Speech Interfaces using Convolutional Neural Networks)

Voice Activity Detection (VAD) is not easy task when the input audio signal is noisy, and it is even more complicated when the input is not even an audio recording. This is the case with Silent Speech Interfaces (SSI) where we record the movement of the articulatory organs during speech, and we aim to reconstruct the speech signal from this recording. Our SSI system synthesizes speech from ultrasonic videos of the tongue movement, and the quality of the resulting speech signals are evaluated by metrics such as the mean squared error loss function of the underlying neural network and the Mel-Cepstral Distortion (MCD) of the reconstructed speech compared to the original. Here, we first demonstrate that the amount of silence in the training data can have an influence both on the MCD evaluation metric and on the performance of the neural network model. Then, we train a convolutional neural network classifier to separate silent and speech-containing ultrasound tongue images, using a conventional VAD algorithm to create the training labels from the corresponding speech signal. In the experiments our ultrasound-based speech/silence separator achieved a classification accuracy of about 85\% and an AUC score around 86\%.

翻译：当输入音频信号噪音时,语音活动检测(VAD)不是一件容易的任务,当输入甚至不是录音时,它就更加复杂了。在静音语音接口(SSI)中,我们记录了讲话期间脉动器官的动向,我们的目标是从这一录音中重建语音信号。我们的SSI系统综合了来自舌声运动超声波视频的语音,由此产生的语音信号的质量则通过诸如以下等指标来评估:基础神经网络的平均平方差错损失功能和与原始声音相比重建后的语音的Mel-Cepstratorrition(MCD) 。在这里,我们首先展示培训数据中的沉默程度能够对MCD评价指标和神经网络模型的性能产生影响。然后,我们用传统的VAD算法来评估由此产生的语音网络语音信号的质量,从相应的语音信号中创建培训标签。在实验中,我们基于超声波/静音频的语音信号/静音频断分数可以对86进行分类,然后对86AQZZAZALA进行精确度分。

相关内容

Neural Networks

关注 1649

神经网络（Neural Networks）是世界上三个最古老的神经建模学会的档案期刊:国际神经网络学会(INNS)、欧洲神经网络学会(ENNS)和日本神经网络学会(JNNS)。神经网络提供了一个论坛，以发展和培育一个国际社会的学者和实践者感兴趣的所有方面的神经网络和相关方法的计算智能。神经网络欢迎高质量论文的提交，有助于全面的神经网络研究，从行为和大脑建模，学习算法，通过数学和计算分析，系统的工程和技术应用，大量使用神经网络的概念和技术。这一独特而广泛的范围促进了生物和技术研究之间的思想交流，并有助于促进对生物启发的计算智能感兴趣的跨学科社区的发展。因此，神经网络编委会代表的专家领域包括心理学，神经生物学，计算机科学，工程，数学，物理。该杂志发表文章、信件和评论以及给编辑的信件、社论、时事、软件调查和专利信息。文章发表在五个部分之一:认知科学，神经科学，学习系统，数学和计算分析、工程和应用。官网地址：http://dblp.uni-trier.de/db/journals/nn/

【Google】大迁移：通用视觉表示学习，General Visual Representation Learning

专知会员服务

37+阅读 · 2020年5月9日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

【ICLR2020-】基于记忆的图网络，MEMORY-BASED GRAPH NETWORKS

专知会员服务

110+阅读 · 2020年2月22日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日