Nowadays, commercial music has extreme loudness and heavily compressed dynamic range compared to the past. Yet, in music source separation, these characteristics have not been thoroughly considered, resulting in the domain mismatch between the laboratory and the real world. In this paper, we confirmed that this domain mismatch negatively affect the performance of the music source separation networks. To this end, we first created the out-of-domain evaluation datasets, musdb-L and XL, by mimicking the music mastering process. Then, we quantitatively verify that the performance of the state-of-the-art algorithms significantly deteriorated in our datasets. Lastly, we proposed LimitAug data augmentation method to reduce the domain mismatch, which utilizes an online limiter during the training data sampling process. We confirmed that it not only alleviates the performance degradation on our out-of-domain datasets, but also results in higher performance on in-domain data.
翻译:目前,商业音乐与过去相比,具有极高的声响和高度压缩的动态范围。然而,在音乐源的分离中,这些特点没有得到彻底考虑,导致实验室与现实世界之间的域错配。在本文中,我们确认,这一域错配对音乐源分离网络的性能产生了负面影响。为此目的,我们首先通过模仿音乐掌握过程,创建了外域评价数据集, musdb-L 和 XL 。然后,我们量化地核实了最先进的算法在我们的数据集中的性能严重恶化。最后,我们提出了“限制搜索”数据增强方法,以减少域错配,这在培训数据取样过程中使用了在线限制。我们确认,它不仅缓解了我们外域数据集的性能退化,而且提高了内域数据的性能。