CITISEN: 深入学习的语音信号处理移动应用 (CITISEN: A Deep Learning-Based Speech Signal-Processing Mobile Application)

In this study, we present a deep learning-based speech signal-processing mobile application, called CITISEN, which can perform three functions: speech enhancement (SE), model adaptation (MA), and acoustic scene conversion (ASC). For SE, CITISEN can effectively reduce noise components from speech signals and accordingly enhance their clarity and intelligibility. When it encounters noisy utterances with unknown speakers or noise types, the MA function allows CITISEN to effectively improve the SE performance by adapting an SE model with a few audio files. Finally, for ASC, CITISEN can convert the current background sound into a different background sound. The experimental results confirmed the effectiveness of performing SE, MA, and ASC functions via objective evaluation and subjective listening tests. Moreover, the MA experimental results indicated that short-time objective intelligibility (STOI) and perceptual evaluation of speech quality (PESQ) could be improved by approximately 5\% and 10\%, respectively. The promising results reveal that the developed CITISEN mobile application can be potentially used as a front-end processor for various speech-related services such as voice communication, assistive hearing devices, and virtual reality headsets. In addition, CITISEN can be used as a platform for using and evaluating the newly performed deep-learning-SE models, and can flexibly extend the models to address various noise environments and users.

翻译：在这项研究中,我们展示了一种深层次的基于学习的语音信号处理移动应用程序,称为CITISEN,它可以发挥三种功能:语音增强(SE)、模型适应(MA)和声学场景转换(ASC)。对于SE, CITISEN可以有效地减少语音信号中的噪音组成部分,从而增强它们的清晰度和洞察力。当它遇到使用不明发言者或噪音类型的声音的噪音发声器时,MA函数使CITISEN能够以少数音频文件调整SE模型,从而有效地改进SE的性能。最后,对ASC来说,CITISEN移动应用程序可以将当前的背景声音转换成不同的背景声音声音。实验结果证实了通过客观评估和主观倾听测试来进行SE、MA和ASC功能转换的效果。此外,MA实验结果表明,短期目标的感知力(STOI)和感知力(PESQ)可以分别用大约5 ⁇ 和10 ⁇ (PESQQQQ)来改进语言质量。有希望的结果显示,开发的CITISEN移动应用程序可以作为各种语音通信、助听力平台和新思维模型,可以用来评价。