This study propose a fully automated system for speech correction and accent reduction. Consider the application scenario that a recorded speech audio contains certain errors, e.g., inappropriate words, mispronunciations, that need to be corrected. The proposed system, named CorrectSpeech, performs the correction in three steps: recognizing the recorded speech and converting it into time-stamped symbol sequence, aligning recognized symbol sequence with target text to determine locations and types of required edit operations, and generating the corrected speech. Experiments show that the quality and naturalness of corrected speech depend on the performance of speech recognition and alignment modules, as well as the granularity level of editing operations. The proposed system is evaluated on two corpora: a manually perturbed version of VCTK and L2-ARCTIC. The results demonstrate that our system is able to correct mispronunciation and reduce accent in speech recordings. Audio samples are available online for demonstration https://daxintan-cuhk.github.io/CorrectSpeech/ .
翻译:本研究建议一个完全自动化的语音校正和口音减缩系统。 考虑一个应用方案,即录音语音包含某些错误,例如不当的词、错误的预言,需要纠正。 名为“ 正确的语音” 的拟议系统分三个步骤进行校正: 承认录音的语音并将其转换成有时间标记的符号序列, 将得到承认的符号序列与目标文本统一起来, 以确定所需的编辑操作的地点和类型, 并生成校正的语音。 实验显示, 校正的语音的质量和自然性取决于语音识别和校正模块的性能, 以及编辑操作的颗粒度。 提议的系统在两个子体上进行了评估: 一个人工穿透的VCTK和L2- ARCTIC版本。 结果显示,我们的系统能够纠正错误发音并减少语音录音的口音。 音样样本可以在线演示 https://daxintan- cuhk.github.io/CorrectSpeech/。