In most of practical scenarios, the announcement system must deliver speech messages in a noisy environment, in which the background noise cannot be cancelled out. The local noise reduces speech intelligibility and increases listening effort of the listener, hence hamper the effectiveness of announcement system. There has been reported that voices of professional announcers are clearer and more comprehensive than that of non-expert speakers in noisy environment. This finding suggests that the speech intelligibility might be related to the speaking style of professional announcer, which can be adapted using voice conversion method. Motivated by this idea, this paper proposes a speech intelligibility enhancement in noisy environment by applying voice conversion method on non-professional voice. We discovered that the professional announcers and non-professional speakers are clusterized into different clusters on the speaker embedding plane. This implies that the speech intelligibility can be controlled as an independent feature of speaker individuality. To examine the advantage of converted voice in noisy environment, we experimented using test words masked in pink noise at different SNR levels. The results of objective and subjective evaluations confirm that the speech intelligibility of converted voice is higher than that of original voice in low SNR conditions.
翻译:在大多数实际情况下,宣布系统必须在一个噪音无法取消背景噪音的噪音的吵闹环境中发送语音信息; 当地噪音降低了语言的洞察力,增加了听众的监听努力,从而妨碍了宣布系统的效力; 据报道,专业播音员的声音比在吵闹环境中非专家演讲者的声音更清楚、更全面; 这一发现表明,演讲的洞察力可能与专业播音员的语音风格有关,可以使用声音转换方法加以调整; 受这一想法的驱使,本文建议通过在非专业声音上应用声音转换方法,提高噪音环境中的语音知觉性; 我们发现,专业播音员和非专业演讲者被集中到发言者嵌入的机上的不同群组中。 这意味着,讲话的洞察力可以被控制为发言者个人的独立特征; 为了研究在噪音环境中转换声音的优点,我们实验了在SNR的不同级别上用粉色噪音遮住的试验词。 客观和主观评价的结果证实,声音转换声音的洞察力高于在低SRR条件中的原始声音。