Archisound: 具有扩散功能的音频生成 (ArchiSound: Audio Generation with Diffusion)

The recent surge in popularity of diffusion models for image generation has brought new attention to the potential of these models in other areas of media generation. One area that has yet to be fully explored is the application of diffusion models to audio generation. Audio generation requires an understanding of multiple aspects, such as the temporal dimension, long term structure, multiple layers of overlapping sounds, and the nuances that only trained listeners can detect. In this work, we investigate the potential of diffusion models for audio generation. We propose a set of models to tackle multiple aspects, including a new method for text-conditional latent audio diffusion with stacked 1D U-Nets, that can generate multiple minutes of music from a textual description. For each model, we make an effort to maintain reasonable inference speed, targeting real-time on a single consumer GPU. In addition to trained models, we provide a collection of open source libraries with the hope of simplifying future work in the field. Samples can be found at https://bit.ly/audio-diffusion. Codes are at https://github.com/archinetai/audio-diffusion-pytorch.

翻译：最近对图像生成传播模型的流行程度的上升使人们重新注意到这些模型在媒体生成的其他领域的潜力。尚未充分探索的一个领域是将传播模型应用于音频生成。音频生成需要了解多个方面,例如时间维度、长期结构、多层重叠声音以及只有经过培训的听众才能探测到的细微问题。我们在此工作中调查了传播模型对音频生成的潜力。我们提出了一套解决多种问题的模型,包括一套用堆叠的1D U-Nets生成文本-有条件潜在音频传播的新方法,它可以从文本描述中产生多分钟的音乐。我们努力保持合理的推论速度,针对单一消费者GPU实时。除了经过培训的模式外,我们还收集开放源图书馆,希望简化外地的未来工作。样本可以在 https://bit.ly/audio-divolution找到。代码在 https://github.com/archinetai/audio-difunf-pytorch。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日