A music mashup combines audio elements from two or more songs to create a new work. To reduce the time and effort required to make them, researchers have developed algorithms that predict the compatibility of audio elements. Prior work has focused on mixing unaltered excerpts, but advances in source separation enable the creation of mashups from isolated stems (e.g., vocals, drums, bass, etc.). In this work, we take advantage of separated stems not just for creating mashups, but for training a model that predicts the mutual compatibility of groups of excerpts, using self-supervised and semi-supervised methods. Specifically, we first produce a random mashup creation pipeline that combines stem tracks obtained via source separation, with key and tempo automatically adjusted to match, since these are prerequisites for high-quality mashups. To train a model to predict compatibility, we use stem tracks obtained from the same song as positive examples, and random combinations of stems with key and/or tempo unadjusted as negative examples. To improve the model and use more data, we also train on "average" examples: random combinations with matching key and tempo, where we treat them as unlabeled data as their true compatibility is unknown. To determine whether the combined signal or the set of stem signals is more indicative of the quality of the result, we experiment on two model architectures and train them using semi-supervised learning technique. Finally, we conduct objective and subjective evaluations of the system, comparing them to a standard rule-based system.
翻译:音乐mashup 结合了来自两个或更多歌曲的音效元素以创建新工作。 为了减少时间和努力, 研究人员开发了预测音效元素兼容性的算法。 先前的工作重点是混合未经改变的节录, 但源分离的进步使得能够从孤立的源( 如声、 鼓、 低音等) 创建混音。 在这项工作中, 我们利用分离的源代码不仅用于创建mashup, 也用于培训一种模型, 该模型预测各组节录的相互兼容性, 使用自我监督的和半监督的方法。 具体地说, 我们首先开发一个随机的 Mashup 创建管道, 将通过源分离获得的条纹路径合并, 关键和节奏自动调整, 因为这些是高品质( 如声、 鼓、 低音等) 的预设模型。 为了改进模型和使用更多数据, 我们还在“ 平均” 系统上培训一个随机的创建管道, 将轨迹连接成一个未知的路径 。 与关键 和 温度 定的模型, 我们用两个结果, 我们用它们 的模型 的模型和 的模型的随机的模型 的模型来分析。