音频LM:音频生成的语言建模方法 (AudioLM: a Language Modeling Approach to Audio Generation) - 专知论文

会员服务 ·

0

语言模型化 · 离散化 · Continuity · 词元分析器 · 掩码语言模型化 ·

2022 年 9 月 7 日

AudioLM: a Language Modeling Approach to Audio Generation

翻译：音频LM:音频生成的语言建模方法

Zalán Borsos,Raphaël Marinier,Damien Vincent,Eugene Kharitonov,Olivier Pietquin,Matt Sharifi,Olivier Teboul,David Grangier,Marco Tagliasacchi,Neil Zeghidour

We introduce AudioLM, a framework for high-quality audio generation with long-term consistency. AudioLM maps the input audio to a sequence of discrete tokens and casts audio generation as a language modeling task in this representation space. We show how existing audio tokenizers provide different trade-offs between reconstruction quality and long-term structure, and we propose a hybrid tokenization scheme to achieve both objectives. Namely, we leverage the discretized activations of a masked language model pre-trained on audio to capture long-term structure and the discrete codes produced by a neural audio codec to achieve high-quality synthesis. By training on large corpora of raw audio waveforms, AudioLM learns to generate natural and coherent continuations given short prompts. When trained on speech, and without any transcript or annotation, AudioLM generates syntactically and semantically plausible speech continuations while also maintaining speaker identity and prosody for unseen speakers. Furthermore, we demonstrate how our approach extends beyond speech by generating coherent piano music continuations, despite being trained without any symbolic representation of music.

翻译：我们引入了具有长期一致性的高质量音频生成框架AudioLM。音频LM 将输入音频映射为一系列离散的象征物,并将音频生成作为一种语言模型任务。我们展示了现有音效代谢器如何在重建质量和长期结构之间提供不同的权衡,我们提出了实现这两个目标的混合代谢方案。也就是说, 我们利用隐蔽语言模式的离散启动,在音频上预先训练过,以捕捉长期结构和神经音频代码生成的离散代码,以实现高质量的合成。通过对大型原始音波形公司的培训,音频M 学会产生自然和连贯的延续, 短短时间的提示。在进行关于语言的培训时, 没有笔录或注释, 音频LM 生成同步和语义上可信的语音延续, 同时保持声频特性和对隐微音器的描述。此外, 我们展示了我们的方法如何超越语言的延伸, 其方式是生成连贯的钢琴音乐延续, 尽管没有任何象征性的音乐表现。

0

相关内容

语言模型化

语言模型化

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

纳米组装和复合结构MgFe2O4负极材料合成及储锂性能研究

国家自然科学基金

0+阅读 · 2014年12月31日

冷胁迫诱导柽柳ThCAP基因表达的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Ghrelin抑制胸腺脂肪细胞生成的分子调控网络研究

国家自然科学基金

0+阅读 · 2012年12月31日

Tecto调节非洲爪蛙胚层决定与分化的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

可注射双交联水凝胶支架的可控聚合及其与神经干细胞的适配性研究

国家自然科学基金

0+阅读 · 2012年12月31日

KCC2介导β-淀粉样蛋白及其前体APP调控γ-氨基丁酸突触功能的分子机理

国家自然科学基金

0+阅读 · 2011年12月31日

基于语言理解的机器翻译方法研究

国家自然科学基金

2+阅读 · 2009年12月31日

基于RGD配体设计的CdTe量子点靶向探针的直接合成及其应用

国家自然科学基金

0+阅读 · 2009年12月31日

BMPR1b受体显负性过表达调控神经干细胞分化修复脊髓损伤的研究

国家自然科学基金

0+阅读 · 2009年12月31日

GmMADS1在大豆花发育中的调控机理研究

国家自然科学基金

0+阅读 · 2008年12月31日

Play It Back: Iterative Attention for Audio Recognition

Arxiv

0+阅读 · 2022年10月20日

Understanding Jargon: Combining Extraction and Generation for Definition Modeling

Arxiv

0+阅读 · 2022年10月20日

Dialogue-adaptive Language Model Pre-training From Quality Estimation

Arxiv

0+阅读 · 2022年10月20日

DeepGen: Diverse Search Ad Generation and Real-Time Customization

Arxiv

0+阅读 · 2022年10月19日

Fine-tuned Language Models are Continual Learners

Arxiv

0+阅读 · 2022年10月19日

Generating Natural Language Proofs with Verifier-Guided Search

Generating Natural Language Proofs with Verifier-Guided Search

Arxiv

0+阅读 · 2022年10月18日

Active Learning for Domain Adaptation: An Energy-based Approach

Arxiv

13+阅读 · 2021年12月2日

Pix2seq: A Language Modeling Framework for Object Detection

Arxiv

10+阅读 · 2021年9月22日

Adversarial Mutual Information for Text Generation

Adversarial Mutual Information for Text Generation

Arxiv

13+阅读 · 2020年6月30日

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Arxiv

17+阅读 · 2019年9月9日

VIP会员

文章信息

相关主题

语言模型化

词元分析器

掩码语言模型化

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能治理的未来

模态感知的特征匹配：单一模态与跨模态技术的全面综述

无监督行人重识别研究综述

【牛津博士论文】面向神经影像应用的可扩展且可解释的空间模型

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

相关论文

Play It Back: Iterative Attention for Audio Recognition

Arxiv

0+阅读 · 2022年10月20日

Understanding Jargon: Combining Extraction and Generation for Definition Modeling

Arxiv

0+阅读 · 2022年10月20日

Dialogue-adaptive Language Model Pre-training From Quality Estimation

Arxiv

0+阅读 · 2022年10月20日

DeepGen: Diverse Search Ad Generation and Real-Time Customization

Arxiv

0+阅读 · 2022年10月19日

Fine-tuned Language Models are Continual Learners

Arxiv

0+阅读 · 2022年10月19日

Generating Natural Language Proofs with Verifier-Guided Search

Generating Natural Language Proofs with Verifier-Guided Search

Arxiv

0+阅读 · 2022年10月18日

Active Learning for Domain Adaptation: An Energy-based Approach

Arxiv

13+阅读 · 2021年12月2日

Pix2seq: A Language Modeling Framework for Object Detection

Arxiv

10+阅读 · 2021年9月22日

Adversarial Mutual Information for Text Generation

Adversarial Mutual Information for Text Generation

Arxiv

13+阅读 · 2020年6月30日

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Arxiv

17+阅读 · 2019年9月9日

相关基金

纳米组装和复合结构MgFe2O4负极材料合成及储锂性能研究

国家自然科学基金

0+阅读 · 2014年12月31日

冷胁迫诱导柽柳ThCAP基因表达的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Ghrelin抑制胸腺脂肪细胞生成的分子调控网络研究

国家自然科学基金

0+阅读 · 2012年12月31日

Tecto调节非洲爪蛙胚层决定与分化的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

可注射双交联水凝胶支架的可控聚合及其与神经干细胞的适配性研究

国家自然科学基金

0+阅读 · 2012年12月31日

KCC2介导β-淀粉样蛋白及其前体APP调控γ-氨基丁酸突触功能的分子机理

国家自然科学基金

0+阅读 · 2011年12月31日

基于语言理解的机器翻译方法研究

国家自然科学基金

2+阅读 · 2009年12月31日

基于RGD配体设计的CdTe量子点靶向探针的直接合成及其应用

国家自然科学基金

0+阅读 · 2009年12月31日

BMPR1b受体显负性过表达调控神经干细胞分化修复脊髓损伤的研究

国家自然科学基金

0+阅读 · 2009年12月31日

GmMADS1在大豆花发育中的调控机理研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员