Metaverse expands the physical world to a new dimension, and the physical environment and Metaverse environment can be directly connected and entered. Voice is an indispensable communication medium in the real world and Metaverse. Fusion of the voice with environment effects is important for user immersion in Metaverse. In this paper, we proposed using the voice conversion based method for the conversion of target environment effect speech. The proposed method was named MetaSpeech, which introduces an environment effect module containing an effect extractor to extract the environment information and an effect encoder to encode the environment effect condition, in which gradient reversal layer was used for adversarial training to keep the speech content and speaker information while disentangling the environmental effects. From the experiment results on the public dataset of LJSpeech with four environment effects, the proposed model could complete the specific environment effect conversion and outperforms the baseline methods from the voice conversion task.
翻译:元数据将物理世界扩展为一个新的维度, 物理环境和元数据环境可以直接连接和输入。 声音是真实世界和元数据世界中不可或缺的通信媒介。 声音与环境效应融合对于Metaveve的用户浸入很重要。 在本文中, 我们提议使用声音转换法转换目标环境效应演讲。 提议的方法名为MetaSpeech, 引入一个环境影响模块, 包含一种效果提取器, 以提取环境信息, 并引入一个影响编码环境效应条件的编码器, 使用梯度回移层进行对抗性培训, 以保存演讲内容和演讲者信息, 并分解环境影响。 根据LJSpeech公共数据集的实验结果, 4种环境效应, 拟议的模型可以完成特定环境效应转换, 并超越语音转换任务的基线方法 。