Often, I hear a piece of music and wonder what the name of the piece is. Indeed, there are applications such as Shazam app that provides music matching. However, the limitations of those apps are that the same piece performed by the same musician cannot be identified if it is not the same recording. Shazam identifies the recording of it, not the music. This is because Shazam matches the variation in volume, not the frequencies of the sound. This research attempts to match music the way humans understand it: by the frequency spectrum of music, not the volume variation. Essentially, the idea is to precompute the frequency spectrums of all the music in the database, then take the unknown piece and try to match its frequency spectrum against every segment of every music in the database. I did it by matching the frequency spectrum of the unknown piece to our database by sliding the window by 0.1 seconds and calculating the error by taking Absolute value, normalizing the audio, subtracting the normalized arrays, and taking the sum of absolute differences. The segment that shows the least error is considered the candidate for the match. The matching performance proved to be dependent on the complexity of the music. Matching simple music, such as single note pieces, was successful. However, more complex pieces, such as Chopins Ballade 4, were not successful, that is, the algorithm could not produce low error values in any of the music in the database. I suspect that it has to do with having too many notes: mismatches in the higher harmonics added up to a significant amount of errors, which swamps the calculations.
翻译:通常, 我听到一个音乐片段, 并想知道它的名称是什么。 事实上, 有一些应用程序, 比如 Shazam 应用程序( Shazam app), 提供音乐匹配。 但是, 这些应用程序的局限性是, 同一音乐家所执行的同一件曲目如果不是相同的录音, 无法被识别出来。 Shazam 识别了它的记录, 而不是音乐。 这是因为 Shazam 匹配了音量的变化, 而不是声音的频率。 此研究试图匹配音乐对人类理解它的方式: 通过音乐的频谱, 而不是音量变化。 基本上, 想法是预估数据库中所有音乐的频率频谱, 然后取出未知的曲目, 并试图将其频谱与数据库中每个音乐段的频率相匹配。 我通过移动窗口的0. 1 秒, 计算出错误的绝对值, 减少正常的阵列, 和绝对差数的总和。 显示最差数的部分被认为是匹配对象。 匹配的分数并不取决于数据库中最高级的曲目 。 。 匹配的成绩在复杂的曲目中并不取决于复杂的曲本 。 。 。 成功的曲程 。