调查堆叠的熔炼日志光谱仪的堆叠集共振动自动编码器的重建能力 (An investigation of the reconstruction capacity of stacked convolutional autoencoders for log-mel-spectrograms)

In audio processing applications, the generation of expressive sounds based on high-level representations demonstrates a high demand. These representations can be used to manipulate the timbre and influence the synthesis of creative instrumental notes. Modern algorithms, such as neural networks, have inspired the development of expressive synthesizers based on musical instrument timbre compression. Unsupervised deep learning methods can achieve audio compression by training the network to learn a mapping from waveforms or spectrograms to low-dimensional representations. This study investigates the use of stacked convolutional autoencoders for the compression of time-frequency audio representations for a variety of instruments for a single pitch. Further exploration of hyper-parameters and regularization techniques is demonstrated to enhance the performance of the initial design. In an unsupervised manner, the network is able to reconstruct a monophonic and harmonic sound based on latent representations. In addition, we introduce an evaluation metric to measure the similarity between the original and reconstructed samples. Evaluating a deep generative model for the synthesis of sound is a challenging task. Our approach is based on the accuracy of the generated frequencies as it presents a significant metric for the perception of harmonic sounds. This work is expected to accelerate future experiments on audio compression using neural autoencoders.

翻译：在音效处理应用程序中,根据高层代表制生成的表达式声音显示出很高的需求。这些表达式可用于操控时空,影响创造性工具的合成。神经网络等现代算法激励了基于音乐乐器的表达式合成器的开发。不受监督的深层学习方法可以通过培训网络从波形或光谱到低维表达式的绘图,实现音质压缩。这项研究调查了堆叠的共振动自动显示器用于压缩用于单个音道的各种仪器的时间-频率音频显示器的使用情况。进一步探索超光度计和正规化技术可以提高初始设计性能。以不受监督的方式,网络能够重建基于潜在表达式的单声波和调音响。此外,我们引入了一种评价指标,以测量原始和重塑样品之间的相似性。评估声音合成的深重感应模型是一项具有挑战性的任务。我们的方法以生成的频率的准确性为基础,因为生成的频率为加速的音频感测度,因为它为加速的音频感测未来声音的感测度提供了重要的测量仪。

相关内容

自编码器

关注 140

自动编码器是一种人工神经网络，用于以无监督的方式学习有效的数据编码。自动编码器的目的是通过训练网络忽略信号“噪声”来学习一组数据的表示（编码），通常用于降维。与简化方面一起，学习了重构方面，在此，自动编码器尝试从简化编码中生成尽可能接近其原始输入的表示形式，从而得到其名称。基本模型存在几种变体，其目的是迫使学习的输入表示形式具有有用的属性。自动编码器可有效地解决许多应用问题，从面部识别到获取单词的语义。

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日