审核：使用潜在扩散模型按照指令进行音频编辑 (AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models) - 专知论文

会员服务 ·

0

输出 · 扩散模型 · 潜在 · 片段 · 自动学习 ·

2023 年 4 月 3 日

AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models

翻译：审核：使用潜在扩散模型按照指令进行音频编辑

Yuancheng Wang,Zeqian Ju,Xu Tan,Lei He,Zhizheng Wu,Jiang Bian,Sheng Zhao

Audio editing is applicable for various purposes, such as adding background sound effects, replacing a musical instrument, and repairing damaged audio. Recently, some diffusion-based methods achieved zero-shot audio editing by using a diffusion and denoising process conditioned on the text description of the output audio. However, these methods still have some problems: 1) they have not been trained on editing tasks and cannot ensure good editing effects; 2) they can erroneously modify audio segments that do not require editing; 3) they need a complete description of the output audio, which is not always available or necessary in practical scenarios. In this work, we propose AUDIT, an instruction-guided audio editing model based on latent diffusion models. Specifically, AUDIT has three main design features: 1) we construct triplet training data (instruction, input audio, output audio) for different audio editing tasks and train a diffusion model using instruction and input (to be edited) audio as conditions and generating output (edited) audio; 2) it can automatically learn to only modify segments that need to be edited by comparing the difference between the input and output audio; 3) it only needs edit instructions instead of full target audio descriptions as text input. AUDIT achieves state-of-the-art results in both objective and subjective metrics for several audio editing tasks (e.g., adding, dropping, replacement, inpainting, super-resolution). Demo samples are available at https://audit-demo.github.io/.

翻译：音频编辑可用于添加背景音效、更换音乐乐器、修复损坏的音频等各种用途。最近，一些基于扩散的方法通过使用扩散和去噪过程来改善输出音频的文本描述，从而实现了零-shot音频编辑。然而，这些方法仍然存在一些问题：1）它们尚未用于编辑任务并且不能确保良好的编辑效果；2）它们可能错误地修改不需要编辑的音频段；3）它们需要完整的输出音频描述，这并不总是在实际场景中可用或必要的。在本文中，我们提出了AUDIT，一种基于潜在扩散模型的指令引导音频编辑模型。具体来说，AUDIT具有三个主要设计特点：1）我们为不同的音频编辑任务构建三元训练数据（指令、输入音频、输出音频），并使用指令和输入（要编辑）音频作为条件来训练扩散模型，并生成输出（编辑）音频；2）它可以自动学习仅修改需要编辑的片段，方法是比较输入和输出音频之间的差异；3）它仅需要编辑指令而不是完整的目标音频描述作为文本输入。AUDIT在几

0

相关内容

【CVPR2023】Vita-CLIP:通过多模态提示的视频和文本自适应CLIP

【CVPR2023】Vita-CLIP:通过多模态提示的视频和文本自适应CLIP

专知会员服务

40+阅读 · 2023年4月11日

「基于深度学习的 SQL 生成」2022研究综述

「基于深度学习的 SQL 生成」2022研究综述

专知会员服务

28+阅读 · 2022年8月12日

【2022新书】文本生成的深度学习方法，201页pdf，Deep Learning Approaches to Text Production

【2022新书】文本生成的深度学习方法，201页pdf，Deep Learning Approaches to Text Production

专知会员服务

39+阅读 · 2022年5月28日

知识增强预训练语言模型:全面综述

知识增强预训练语言模型:全面综述

专知会员服务

93+阅读 · 2021年10月19日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

专知会员服务

51+阅读 · 2020年3月7日

【反馈循环自编码器】FEEDBACK RECURRENT AUTOENCODER

【反馈循环自编码器】FEEDBACK RECURRENT AUTOENCODER

专知会员服务

23+阅读 · 2020年1月28日

Google AI博客解读论文《Reformer: The Efficient Transformer》，百万量级注意力机制

Google AI博客解读论文《Reformer: The Efficient Transformer》，百万量级注意力机制

专知会员服务

70+阅读 · 2020年1月17日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

专知

16+阅读 · 2018年5月14日

LibRec 精选：推荐的可解释性[综述]

LibRec 精选：推荐的可解释性[综述]

LibRec智能推荐

10+阅读 · 2018年5月4日

【论文推荐】最新六篇图像描述生成相关论文—视频摘要、注意力张量积、非自回归神经序列模型、副词识别、多主体、多样性度量

【论文推荐】最新六篇图像描述生成相关论文—视频摘要、注意力张量积、非自回归神经序列模型、副词识别、多主体、多样性度量

专知

10+阅读 · 2018年3月2日

【论文推荐】最新5篇图像分割相关论文—条件随机场和深度特征学习、移动端网络、长期视觉定位、主动学习、主动轮廓模型、生成对抗性网络

【论文推荐】最新5篇图像分割相关论文—条件随机场和深度特征学习、移动端网络、长期视觉定位、主动学习、主动轮廓模型、生成对抗性网络

专知

13+阅读 · 2018年1月23日

E3泛素连接酶ARF-BP1新的底物筛选及其调节细胞增殖和肿瘤发生的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

函数数据变换模型及降维方法的研究

国家自然科学基金

1+阅读 · 2015年12月31日

基于sEMG信号的下肢康复机器人肌力预测模型与交互式自适应阻抗控制研究

国家自然科学基金

0+阅读 · 2014年12月31日

语音同步的高真实感三维人脸情感动画研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于上转换荧光共振能量转移的超分子手性光敏化

国家自然科学基金

0+阅读 · 2012年12月31日

microRNA调节肿瘤抑制因子Caliban应答DNA损伤的机制

国家自然科学基金

1+阅读 · 2012年12月31日

Hippo信号传导通路Lats1/2激酶的底物筛选及功能研究

国家自然科学基金

0+阅读 · 2012年12月31日

以细胞有丝分裂中纺锤体驱动蛋白为靶点的RNA干扰药物研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于编译的嵌入式系统优化研究

国家自然科学基金

1+阅读 · 2011年12月31日

福氏志贺氏菌HtrA蛋白功能研究

国家自然科学基金

0+阅读 · 2011年12月31日

A Diffusion Probabilistic Prior for Low-Dose CT Image Denoising

Arxiv

0+阅读 · 2023年5月25日

Cross-domain Compositing with Pretrained Diffusion Models

Arxiv

0+阅读 · 2023年5月25日

Diffusion-Based Audio Inpainting

Arxiv

0+阅读 · 2023年5月24日

Deceptive-NeRF: Enhancing NeRF Reconstruction using Pseudo-Observations from Diffusion Models

Arxiv

0+阅读 · 2023年5月24日

In-context Example Selection for Machine Translation Using Multiple Features

Arxiv

0+阅读 · 2023年5月23日

Realistic Noise Synthesis with Diffusion Models

Arxiv

0+阅读 · 2023年5月23日

Compositional Text-to-Image Synthesis with Attention Map Control of Diffusion Models

Arxiv

0+阅读 · 2023年5月23日

FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion Models

Arxiv

0+阅读 · 2023年5月23日

Adversarial Nibbler: A Data-Centric Challenge for Improving the Safety of Text-to-Image Models

Arxiv

0+阅读 · 2023年5月22日

Diffusion Models: A Comprehensive Survey of Methods and Applications

Arxiv

67+阅读 · 2022年9月2日

VIP会员

文章信息

相关主题

相关VIP内容

【CVPR2023】Vita-CLIP:通过多模态提示的视频和文本自适应CLIP

【CVPR2023】Vita-CLIP:通过多模态提示的视频和文本自适应CLIP

专知会员服务

40+阅读 · 2023年4月11日

「基于深度学习的 SQL 生成」2022研究综述

「基于深度学习的 SQL 生成」2022研究综述

专知会员服务

28+阅读 · 2022年8月12日

【2022新书】文本生成的深度学习方法，201页pdf，Deep Learning Approaches to Text Production

【2022新书】文本生成的深度学习方法，201页pdf，Deep Learning Approaches to Text Production

专知会员服务

39+阅读 · 2022年5月28日

知识增强预训练语言模型:全面综述

知识增强预训练语言模型:全面综述

专知会员服务

93+阅读 · 2021年10月19日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

专知会员服务

51+阅读 · 2020年3月7日

【反馈循环自编码器】FEEDBACK RECURRENT AUTOENCODER

【反馈循环自编码器】FEEDBACK RECURRENT AUTOENCODER

专知会员服务

23+阅读 · 2020年1月28日

Google AI博客解读论文《Reformer: The Efficient Transformer》，百万量级注意力机制

Google AI博客解读论文《Reformer: The Efficient Transformer》，百万量级注意力机制

专知会员服务

70+阅读 · 2020年1月17日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《使用量化测量将传感器节点关联到融合中心的算法设计》171页

军事前沿模型

提升军事训练能力的最佳人工智能模拟工具

《社交媒体信息作战》最新48页技术报告

相关资讯

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

专知

16+阅读 · 2018年5月14日

LibRec 精选：推荐的可解释性[综述]

LibRec 精选：推荐的可解释性[综述]

LibRec智能推荐

10+阅读 · 2018年5月4日

【论文推荐】最新六篇图像描述生成相关论文—视频摘要、注意力张量积、非自回归神经序列模型、副词识别、多主体、多样性度量

【论文推荐】最新六篇图像描述生成相关论文—视频摘要、注意力张量积、非自回归神经序列模型、副词识别、多主体、多样性度量

专知

10+阅读 · 2018年3月2日

【论文推荐】最新5篇图像分割相关论文—条件随机场和深度特征学习、移动端网络、长期视觉定位、主动学习、主动轮廓模型、生成对抗性网络

【论文推荐】最新5篇图像分割相关论文—条件随机场和深度特征学习、移动端网络、长期视觉定位、主动学习、主动轮廓模型、生成对抗性网络

专知

13+阅读 · 2018年1月23日

相关论文

A Diffusion Probabilistic Prior for Low-Dose CT Image Denoising

Arxiv

0+阅读 · 2023年5月25日

Cross-domain Compositing with Pretrained Diffusion Models

Arxiv

0+阅读 · 2023年5月25日

Diffusion-Based Audio Inpainting

Arxiv

0+阅读 · 2023年5月24日

Deceptive-NeRF: Enhancing NeRF Reconstruction using Pseudo-Observations from Diffusion Models

Arxiv

0+阅读 · 2023年5月24日

In-context Example Selection for Machine Translation Using Multiple Features

Arxiv

0+阅读 · 2023年5月23日

Realistic Noise Synthesis with Diffusion Models

Arxiv

0+阅读 · 2023年5月23日

Compositional Text-to-Image Synthesis with Attention Map Control of Diffusion Models

Arxiv

0+阅读 · 2023年5月23日

FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion Models

Arxiv

0+阅读 · 2023年5月23日

Adversarial Nibbler: A Data-Centric Challenge for Improving the Safety of Text-to-Image Models

Arxiv

0+阅读 · 2023年5月22日

Diffusion Models: A Comprehensive Survey of Methods and Applications

Arxiv

67+阅读 · 2022年9月2日

相关基金

E3泛素连接酶ARF-BP1新的底物筛选及其调节细胞增殖和肿瘤发生的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

函数数据变换模型及降维方法的研究

国家自然科学基金

1+阅读 · 2015年12月31日

基于sEMG信号的下肢康复机器人肌力预测模型与交互式自适应阻抗控制研究

国家自然科学基金

0+阅读 · 2014年12月31日

语音同步的高真实感三维人脸情感动画研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于上转换荧光共振能量转移的超分子手性光敏化

国家自然科学基金

0+阅读 · 2012年12月31日

microRNA调节肿瘤抑制因子Caliban应答DNA损伤的机制

国家自然科学基金

1+阅读 · 2012年12月31日

Hippo信号传导通路Lats1/2激酶的底物筛选及功能研究

国家自然科学基金

0+阅读 · 2012年12月31日

以细胞有丝分裂中纺锤体驱动蛋白为靶点的RNA干扰药物研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于编译的嵌入式系统优化研究

国家自然科学基金

1+阅读 · 2011年12月31日

福氏志贺氏菌HtrA蛋白功能研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员