配有稳定管道的梅尔光谱转换 (Mel Spectrogram Inversion with Stable Pitch)

Vocoders are models capable of transforming a low-dimensional spectral representation of an audio signal, typically the mel spectrogram, to a waveform. Modern speech generation pipelines use a vocoder as their final component. Recent vocoder models developed for speech achieve a high degree of realism, such that it is natural to wonder how they would perform on music signals. Compared to speech, the heterogeneity and structure of the musical sound texture offers new challenges. In this work we focus on one specific artifact that some vocoder models designed for speech tend to exhibit when applied to music: the perceived instability of pitch when synthesizing sustained notes. We argue that the characteristic sound of this artifact is due to the lack of horizontal phase coherence, which is often the result of using a time-domain target space with a model that is invariant to time-shifts, such as a convolutional neural network. We propose a new vocoder model that is specifically designed for music. Key to improving the pitch stability is the choice of a shift-invariant target space that consists of the magnitude spectrum and the phase gradient. We discuss the reasons that inspired us to re-formulate the vocoder task, outline a working example, and evaluate it on musical signals. Our method results in 60% and 10% improved reconstruction of sustained notes and chords with respect to existing models, using a novel harmonic error metric.

翻译：Vocoders 是能够将音频信号的低维光谱代表,典型的光谱光谱光谱显示转化为波形的模型。现代语音生成管道使用电动元件作为最终组成部分。最近为语音开发的电动coder模型具有高度现实性,因此自然会怀疑他们如何在音乐信号上表现。与语言相比,音乐声音纹理的异质和结构带来了新的挑战。在这项工作中,我们侧重于为语音设计的一些电动模型在应用到音乐时往往会展示的一种特定工艺品:在合成持续笔记时,所感觉到的音频不稳定性。我们争论的是,这种工艺品的典型声音是由于横向阶段缺乏一致性,而这往往是使用一个时间-持续目标空间,其模型与时间变异,例如一个革命性神经网络。我们提出了一个新的电动代码模型模型,专门为音乐设计。改进调频谱稳定性的关键是选择一个变化性目标空间,该变化性目标空间由大小频谱和成熟度轮廓组成,我们用60度模型进行推导的频率和质变频阶段,我们用一个模型来评估。我们变动的频率和感变频图。我们变动的模型,我们用10号的变式模型来评估了10号的频率和变频图。我们调整了10号的变频图图。我们用的变动的变式模型,我们用了10号的变式的变换的变式的变式的变式的变式的变式的变式的变式的变式模型,我们用了10级图。我们用了10号的变式的变换的变式的变式的变换的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/