The deep learning community has witnessed an exponentially growing interest in self-supervised learning (SSL). However, it still remains unexplored how to build a framework for learning useful representations of raw music waveforms in a self-supervised manner. In this work, we design Music2Vec, a framework exploring different SSL algorithmic components and tricks for music audio recordings. Our model achieves comparable results to the state-of-the-art (SOTA) music SSL model Jukebox, despite being significantly smaller with less than 2% of parameters of the latter. The model will be released on Huggingface(Please refer to: https://huggingface.co/m-a-p/music2vec-v1)
翻译:深层学习界对自我监督学习(SSL)的兴趣成倍增长。然而,它仍未探索如何建立一个框架,以自我监督的方式学习原材料音乐波形的有用表现。在这项工作中,我们设计了Music2Vec(Music2Vec),这是一个探索不同 SLS 算法组成部分和音乐录音技巧的框架。我们的模型取得了与最新艺术音乐(SOTA) SSL 模式的软体箱相当的结果,尽管它小得多,占后者参数的不到2%。模型将在Huggingface上发布(请参见:https://huggingface.co/m-a-p/mus2vec-v1)。