Visuals are a core part of our experience of music, owing to the way they can amplify the emotions and messages conveyed through the music. However, creating music visualization is a complex, time-consuming, and resource-intensive process. We introduce Generative Disco, a generative AI system that helps generate music visualizations with large language models and text-to-image models. Users select intervals of music to visualize and then parameterize that visualization by defining start and end prompts. These prompts are warped between and generated according to the beat of the music for audioreactive video. We introduce design patterns for improving generated videos: "transitions", which express shifts in color, time, subject, or style, and "holds", which encourage visual emphasis and consistency. A study with professionals showed that the system was enjoyable, easy to explore, and highly expressive. We conclude on use cases of Generative Disco for professionals and how AI-generated content is changing the landscape of creative work.
翻译:视觉是我们体验音乐的核心组成部分,因为它们可以放大通过音乐传达的情感和信息。然而,创建音乐可视化是一个复杂、耗时且资源密集的过程。我们介绍了Generative Disco,这是一个生成式AI系统,它使用大型语言模型和文本到图像模型来生成音乐可视化。用户选择要可视化的音乐间隔,然后通过定义起始和终止提示来参数化该可视化。这些提示根据音乐的节拍进行扭曲和生成,以产生对音频反应视觉视频。我们介绍了一些用于改进生成视频的设计模式:“转换”(transitions),它们表达颜色、时间、主题或风格的变化,以及“保持”(holds),它们鼓励视觉重点和一致性。与专业人士进行的一项研究表明,该系统既有趣,又易于探索,而且高度表现力。我们总结了Generative Disco的专业用途,以及AI生成的内容如何改变创意工作。