Music source separation is the task of extracting an estimate of one or more isolated sources or instruments (for example, drums or vocals) from musical audio. The task of music demixing or unmixing considers the case where the musical audio is separated into an estimate of all of its constituent sources that can be summed back to the original mixture. The Music Demixing Challenge was created to inspire new demixing research. Open-Unmix (UMX), and the improved variant CrossNet-Open-Unmix (X-UMX), were included in the challenge as the baselines. Both models use the Short-Time Fourier Transform (STFT) as the representation of music signals. The time-frequency uncertainty principle states that the STFT of a signal cannot have maximal resolution in both time and frequency. The tradeoff in time-frequency resolution can significantly affect music demixing results. Our proposed adaptation of UMX replaced the STFT with the sliCQT, a time-frequency transform with varying time-frequency resolution. Unfortunately, our model xumx-sliCQ achieved lower demixing scores than UMX.
翻译:音乐源分离的任务是从音乐音频中提取一种或多种孤立来源或乐器的估计值(例如鼓声或声响)。音乐解混或解混任务考虑到音乐音频被分离成所有组成来源的估计值,可以与原混合物相归。音乐解混挑战的创建是为了激发新的解混研究。Open-Unmix(UMX)和经改进的变体CrossNet-Open-Umix(X-UMX)被作为基准列入挑战中。两种模型都使用短时 Fourier变换(STFT)作为音乐信号的表示。时间-频率不确定原则指出,信号的STFT不能在时间和频率上具有最大分辨率。时间-频率分辨率的转换可以极大地影响音乐解混结果。我们提议的对UMX的调整用 sliCQT 取代STFT,这是时间-频率变换,但不幸的是,我们的模型 xumxliCQ实现了比UMX低解密得分数。