Vocoders are models capable of transforming a low-dimensional spectral representation of an audio signal, typically the mel spectrogram, to a waveform. Modern speech generation pipelines use a vocoder as their final component. Recent vocoder models developed for speech achieve a high degree of realism, such that it is natural to wonder how they would perform on music signals. Compared to speech, the heterogeneity and structure of the musical sound texture offers new challenges. In this work we focus on one specific artifact that some vocoder models designed for speech tend to exhibit when applied to music: the perceived instability of pitch when synthesizing sustained notes. We argue that the characteristic sound of this artifact is due to the lack of horizontal phase coherence, which is often the result of using a time-domain target space with a model that is invariant to time-shifts, such as a convolutional neural network. We propose a new vocoder model that is specifically designed for music. Key to improving the pitch stability is the choice of a shift-invariant target space that consists of the magnitude spectrum and the phase gradient. We discuss the reasons that inspired us to re-formulate the vocoder task, outline a working example, and evaluate it on musical signals. Our method results in 60% and 10% improved reconstruction of sustained notes and chords with respect to existing models, using a novel harmonic error metric.
翻译:Vocoders 是能够将音频信号的低维光谱代表,典型的光谱光谱光谱显示转化为波形的模型。现代语音生成管道使用电动元件作为最终组成部分。最近为语音开发的电动coder模型具有高度现实性,因此自然会怀疑他们如何在音乐信号上表现。与语言相比,音乐声音纹理的异质和结构带来了新的挑战。在这项工作中,我们侧重于为语音设计的一些电动模型在应用到音乐时往往会展示的一种特定工艺品:在合成持续笔记时,所感觉到的音频不稳定性。我们争论的是,这种工艺品的典型声音是由于横向阶段缺乏一致性,而这往往是使用一个时间-持续目标空间,其模型与时间变异,例如一个革命性神经网络。我们提出了一个新的电动代码模型模型,专门为音乐设计。改进调频谱稳定性的关键是选择一个变化性目标空间,该变化性目标空间由大小频谱和成熟度轮廓组成,我们用60度模型进行推导的频率和质变频阶段,我们用一个模型来评估。我们变动的频率和感变频图。我们变动的模型,我们用10号的变式模型来评估了10号的频率和变频图。我们调整了10号的变频图图。我们用的变动的变式模型,我们用了10号的变式的变换的变式的变式的变式的变式的变式的变式的变式的变式模型,我们用了10级图。我们用了10号的变式的变换的变式的变式的变换的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的