Despite the central role that melody plays in music perception, it remains an open challenge in music information retrieval to reliably detect the notes of the melody present in an arbitrary music recording. A key challenge in melody transcription is building methods which can handle broad audio containing any number of instrument ensembles and musical styles - existing strategies work well for some melody instruments or styles but not all. To confront this challenge, we leverage representations from Jukebox (Dhariwal et al. 2020), a generative model of broad music audio, thereby improving performance on melody transcription by $20$% relative to conventional spectrogram features. Another obstacle in melody transcription is a lack of training data - we derive a new dataset containing $50$ hours of melody transcriptions from crowdsourced annotations of broad music. The combination of generative pre-training and a new dataset for this task results in $77$% stronger performance on melody transcription relative to the strongest available baseline. By pairing our new melody transcription approach with solutions for beat detection, key estimation, and chord recognition, we build Sheet Sage, a system capable of transcribing human-readable lead sheets directly from music audio. Audio examples can be found at https://chrisdonahue.com/sheetsage and code at https://github.com/chrisdonahue/sheetsage .
翻译:尽管旋律在音乐感知方面发挥着核心作用,但它仍然是音乐信息检索方面的一个公开挑战,以可靠地探测任意音乐录制中的旋律笔记;旋律抄录中的一项关键挑战是建立能够处理包含任何数个乐器组合和音乐风格的广泛音响的方法 -- 现有的战略对某些旋律乐器或风格(但并非全部)运作良好。为了应对这一挑战,我们利用了来自音乐盒(Dhariwal et al. 2020)的代表,即广度音乐音频发音的基因化模型,从而提高了与传统光谱特征相对20美元的旋律转录的性能。旋律抄录中的另一个障碍是缺乏培训数据 -- -- 我们从广泛音乐的多方源说明中获取了包含50小时旋律曲本的新数据集。 将某些旋律预演前和新数据集结合起来,使旋律调调调调调调调调音效比现有最强77 % 。 通过将我们的新旋律曲调调调方法配对成比常规光谱谱特征的20美元,比重20美20美 。旋律抄录音频谱抄录音频调调调调调调的另一种障碍曲录音频/音频调识别识别/音频调识别识别系统,在直接翻校正/音频谱/音频缩。我们在制制制制制制制制制制制制制制制制制制制制制制制制的系统。