Onuma-to-波对波:用Onotopooceic 单词进行无害环境合成 (Onoma-to-wave: Environmental sound synthesis from onomatopoeic words) - 专知论文

会员服务 ·

0

回合 · 控制器 · Better · 标注 · 相同 ·

2021 年 2 月 11 日

Onoma-to-wave: Environmental sound synthesis from onomatopoeic words

翻译：Onuma-to-波对波:用Onotopooceic 单词进行无害环境合成

Yuki Okamoto,Keisuke Imoto,Shinnosuke Takamichi,Ryosuke Yamanishi,Takahiro Fukumori,Yoichi Yamashita

from arxiv, Submitted to IJCNN2021

In this paper, we propose a new framework for environmental sound synthesis using onomatopoeic words and sound event labels. The conventional method of environmental sound synthesis, in which only sound event labels are used, cannot finely control the time-frequency structural features of synthesized sounds, such as sound duration, timbre, and pitch. There are various ways to express environmental sound other than sound event labels, such as the use of onomatopoeic words. An onomatopoeic word, which is a character sequence for phonetically imitating a sound, has been shown to be effective for describing the phonetic feature of sounds. We believe that environmental sound synthesis using onomatopoeic words will enable us to control the fine time-frequency structural features of synthesized sounds, such as sound duration, timbre, and pitch. In this paper, we thus propose environmental sound synthesis from onomatopoeic words on the basis of a sequence-to-sequence framework. To convert onomatopoeic words to environmental sound, we use a sequence-to-sequence framework. We also propose a method of environmental sound synthesis using onomatopoeic words and sound event labels to control the fine time-frequency structure and frequency property of synthesized sounds. Our subjective experiments show that the proposed method achieves the same level of sound quality as the conventional method using WaveNet. Moreover, our methods are better than the conventional method in terms of the expressiveness of synthesized sounds to onomatopoeic words.

翻译：在本文中, 我们提出一个新的环境健全合成框架。常规环境健全合成方法( 仅使用合理的事件标签 ) 无法精密控制合成声音的时间- 频率结构特征, 例如声音持续时间、 Timbre 和音调。本文中, 有多种方法可以表达环境健全而非无害事件标签, 比如使用在线语言。一个 Ocomotopooe 字词( 是音调合成声音的字符序列 ) 已被证明有效描述声音的音调特征。我们认为, 使用有声调标志的音调合成传统方法将使我们能够控制合成声音的精细时间- 频率结构特征, 比如音调持续时间、 Timbre 和音调。因此, 在本文中, 我们建议用有声调的词来表达环境声音合成, 将有色调的词转换为有声调的音调词, 我们用一个从音调的直线词框架。我们还建议一种使用有色的音调合成方法, 将我们所使用的频率分析方法, 将我们所使用的音调分析方法的音调分析方法。

0

相关内容

【文献综述】Text Detection and Recognition in the Wild: A Review 自然文本检测与识别

【文献综述】Text Detection and Recognition in the Wild: A Review 自然文本检测与识别

专知会员服务

46+阅读 · 2020年6月11日

神经网络与形式语言综述，12页pdf，A Survey of Neural Networks and Formal Languages

神经网络与形式语言综述，12页pdf，A Survey of Neural Networks and Formal Languages

专知会员服务

21+阅读 · 2020年6月4日

【IJCAI2020】统计相关模型，A Complete Characterization of Projectivity for Statistical Relational Models

【IJCAI2020】统计相关模型，A Complete Characterization of Projectivity for Statistical Relational Models

专知会员服务

20+阅读 · 2020年4月25日

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

专知会员服务

17+阅读 · 2020年3月23日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【MIT】时间序列GAN，Subadditivity of Probability Divergences

专知会员服务

63+阅读 · 2020年3月4日

【MIT】生成模型提出的分子的可合成性，48页pdf,The Synthesizability of Molecules Proposed by Generative Models

【MIT】生成模型提出的分子的可合成性，48页pdf,The Synthesizability of Molecules Proposed by Generative Models

专知会员服务

28+阅读 · 2020年2月20日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

专知会员服务

24+阅读 · 2019年12月15日

【用十亿级半监督学习实现最先进图像与视频分类】《Billion-scale semi-supervised learning for state-of-the-art image and video classification | Facebook》

【用十亿级半监督学习实现最先进图像与视频分类】《Billion-scale semi-supervised learning for state-of-the-art image and video classification | Facebook》

专知会员服务

16+阅读 · 2019年10月21日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

已删除

将门创投

13+阅读 · 2019年4月17日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

现代情感分析方法

现代情感分析方法

算法与数学之美

14+阅读 · 2018年1月12日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

MoCoGAN 分解运动和内容的视频生成

MoCoGAN 分解运动和内容的视频生成

CreateAMind

18+阅读 · 2017年10月21日

Generative Adversarial Text to Image Synthesis论文解读

Generative Adversarial Text to Image Synthesis论文解读

统计学习与视觉计算组

13+阅读 · 2017年6月9日

Acted vs. Improvised: Domain Adaptation for Elicitation Approaches in Audio-Visual Emotion Recognition

Arxiv

0+阅读 · 2021年4月5日

Reinforcement Learning for Emotional Text-to-Speech Synthesis with Improved Emotion Discriminability

Arxiv

0+阅读 · 2021年4月3日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

Neural Response Generation with Meta-Words

Neural Response Generation with Meta-Words

Arxiv

6+阅读 · 2019年6月14日

Deep learning for time series classification: a review

Arxiv

12+阅读 · 2019年3月14日

Dynamic Transfer Learning for Named Entity Recognition

Dynamic Transfer Learning for Named Entity Recognition

Arxiv

3+阅读 · 2018年12月13日

Automatic Face Aging in Videos via Deep Reinforcement Learning

Arxiv

4+阅读 · 2018年11月27日

Speech waveform synthesis from MFCC sequences with generative adversarial networks

Arxiv

5+阅读 · 2018年4月3日

Learning View-Specific Deep Networks for Person Re-Identification

Arxiv

7+阅读 · 2018年3月30日

Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification

Arxiv

4+阅读 · 2017年11月27日

VIP会员

文章信息

相关主题

相关VIP内容

【文献综述】Text Detection and Recognition in the Wild: A Review 自然文本检测与识别

【文献综述】Text Detection and Recognition in the Wild: A Review 自然文本检测与识别

专知会员服务

46+阅读 · 2020年6月11日

神经网络与形式语言综述，12页pdf，A Survey of Neural Networks and Formal Languages

神经网络与形式语言综述，12页pdf，A Survey of Neural Networks and Formal Languages

专知会员服务

21+阅读 · 2020年6月4日

【IJCAI2020】统计相关模型，A Complete Characterization of Projectivity for Statistical Relational Models

【IJCAI2020】统计相关模型，A Complete Characterization of Projectivity for Statistical Relational Models

专知会员服务

20+阅读 · 2020年4月25日

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

专知会员服务

17+阅读 · 2020年3月23日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【MIT】时间序列GAN，Subadditivity of Probability Divergences

专知会员服务

63+阅读 · 2020年3月4日

【MIT】生成模型提出的分子的可合成性，48页pdf,The Synthesizability of Molecules Proposed by Generative Models

【MIT】生成模型提出的分子的可合成性，48页pdf,The Synthesizability of Molecules Proposed by Generative Models

专知会员服务

28+阅读 · 2020年2月20日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

专知会员服务

24+阅读 · 2019年12月15日

【用十亿级半监督学习实现最先进图像与视频分类】《Billion-scale semi-supervised learning for state-of-the-art image and video classification | Facebook》

【用十亿级半监督学习实现最先进图像与视频分类】《Billion-scale semi-supervised learning for state-of-the-art image and video classification | Facebook》

专知会员服务

16+阅读 · 2019年10月21日

热门VIP内容

开通专知VIP会员享更多权益服务

Nature综述：金融网络中的物理学

【CMU博士论文】通信高效且差分隐私的优化方法

《数字孪生与生成式AI融合构建战术网络弹性边缘智能》

新质生成式AI赋能产业变革的实践与路径

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

已删除

将门创投

13+阅读 · 2019年4月17日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

现代情感分析方法

现代情感分析方法

算法与数学之美

14+阅读 · 2018年1月12日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

MoCoGAN 分解运动和内容的视频生成

MoCoGAN 分解运动和内容的视频生成

CreateAMind

18+阅读 · 2017年10月21日

Generative Adversarial Text to Image Synthesis论文解读

Generative Adversarial Text to Image Synthesis论文解读

统计学习与视觉计算组

13+阅读 · 2017年6月9日

相关论文

Acted vs. Improvised: Domain Adaptation for Elicitation Approaches in Audio-Visual Emotion Recognition

Arxiv

0+阅读 · 2021年4月5日

Reinforcement Learning for Emotional Text-to-Speech Synthesis with Improved Emotion Discriminability

Arxiv

0+阅读 · 2021年4月3日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

Neural Response Generation with Meta-Words

Neural Response Generation with Meta-Words

Arxiv

6+阅读 · 2019年6月14日

Deep learning for time series classification: a review

Arxiv

12+阅读 · 2019年3月14日

Dynamic Transfer Learning for Named Entity Recognition

Dynamic Transfer Learning for Named Entity Recognition

Arxiv

3+阅读 · 2018年12月13日

Automatic Face Aging in Videos via Deep Reinforcement Learning

Arxiv

4+阅读 · 2018年11月27日

Speech waveform synthesis from MFCC sequences with generative adversarial networks

Arxiv

5+阅读 · 2018年4月3日

Learning View-Specific Deep Networks for Person Re-Identification

Arxiv

7+阅读 · 2018年3月30日

Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification

Arxiv

4+阅读 · 2017年11月27日

微信扫码咨询专知VIP会员