Parallel to rapid advancements in foundation model research, the past few years have witnessed a surge in music AI applications. As AI-generated and AI-augmented music become increasingly mainstream, many researchers in the music AI community may wonder: what research frontiers remain unexplored? This paper outlines several key areas within music AI research that present significant opportunities for further investigation. We begin by examining foundational representation models and highlight emerging efforts toward explainability and interpretability. We then discuss the evolution toward multimodal systems, provide an overview of the current landscape of music datasets and their limitations, and address the growing importance of model efficiency in both training and deployment. Next, we explore applied directions, focusing first on generative models. We review recent systems, their computational constraints, and persistent challenges related to evaluation and controllability. We then examine extensions of these generative approaches to multimodal settings and their integration into artists' workflows, including applications in music editing, captioning, production, transcription, source separation, performance, discovery, and education. Finally, we explore copyright implications of generative music and propose strategies to safeguard artist rights. While not exhaustive, this survey aims to illuminate promising research directions enabled by recent developments in music foundation models.
翻译:随着基础模型研究的快速发展,过去几年见证了音乐人工智能应用的激增。在AI生成音乐与AI增强音乐日益成为主流的背景下,许多音乐人工智能领域的研究者可能会思考:哪些研究前沿尚未被充分探索?本文概述了音乐人工智能研究中几个具有重要探索机遇的关键领域。我们首先审视基础表征模型,并重点介绍了在可解释性与可理解性方面的新兴研究进展。随后,我们讨论了向多模态系统的演进,概述了当前音乐数据集的现状及其局限性,并探讨了模型在训练与部署中效率日益凸显的重要性。接着,我们聚焦于生成模型,深入探究应用方向。我们回顾了近期系统、其计算约束,以及与评估和可控性相关的持续挑战。继而,我们考察了这些生成方法在多模态场景中的扩展及其在艺术家工作流程中的整合,包括音乐编辑、描述生成、制作、转录、音源分离、演奏、发现及教育等应用。最后,我们探讨了生成音乐涉及的版权问题,并提出保护艺术家权益的策略。尽管未能涵盖所有方面,本综述旨在阐明由近期音乐基础模型发展所催生的具有前景的研究方向。