SAO-Instruct：基于自然语言指令的自由形式音频编辑 (SAO-Instruct: Free-form Audio Editing using Natural Language Instructions)

Generative models have made significant progress in synthesizing high-fidelity audio from short textual descriptions. However, editing existing audio using natural language has remained largely underexplored. Current approaches either require the complete description of the edited audio or are constrained to predefined edit instructions that lack flexibility. In this work, we introduce SAO-Instruct, a model based on Stable Audio Open capable of editing audio clips using any free-form natural language instruction. To train our model, we create a dataset of audio editing triplets (input audio, edit instruction, output audio) using Prompt-to-Prompt, DDPM inversion, and a manual editing pipeline. Although partially trained on synthetic data, our model generalizes well to real in-the-wild audio clips and unseen edit instructions. We demonstrate that SAO-Instruct achieves competitive performance on objective metrics and outperforms other audio editing approaches in a subjective listening study. To encourage future research, we release our code and model weights.

翻译：生成模型在根据简短文本描述合成高保真音频方面已取得显著进展。然而，利用自然语言编辑现有音频在很大程度上仍未得到充分探索。现有方法要么需要提供编辑后音频的完整描述，要么受限于预定义的编辑指令，缺乏灵活性。在本研究中，我们提出了SAO-Instruct，这是一个基于Stable Audio Open的模型，能够使用任何自由形式的自然语言指令编辑音频片段。为训练我们的模型，我们利用Prompt-to-Prompt、DDPM反演和手动编辑流程构建了一个音频编辑三元组数据集（输入音频、编辑指令、输出音频）。尽管部分训练数据是合成的，但我们的模型能够很好地泛化到真实环境中的音频片段和未见过的编辑指令。我们证明SAO-Instruct在客观指标上取得了有竞争力的性能，并在主观听音测试中优于其他音频编辑方法。为促进未来研究，我们公开了代码和模型权重。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日