With active research in audio compression techniques yielding substantial breakthroughs, spectral reconstruction of low-quality audio waves remains a less indulged topic. In this paper, we propose a novel approach for reconstructing higher frequencies from considerably longer sequences of low-quality MP3 audio waves. Our technique involves inpainting audio spectrograms with residually stacked autoencoder blocks by manipulating individual amplitude and phase values in relation to perceptual differences. Our architecture presents several bottlenecks while preserving the spectral structure of the audio wave via skip-connections. We also compare several task metrics and demonstrate our visual guide to loss selection. Moreover, we show how to leverage differential quantization techniques to reduce the initial model size by more than half while simultaneously reducing inference time, which is crucial in real-world applications.
翻译:随着对音频压缩技术的积极研究取得重大突破,低质量音波的光谱重建仍是一个不太受关注的主题。 在本文中,我们提出一个新的方法,从低质量MP3音波的长得多的序列中重建高频率。我们的技术涉及通过操纵个人振幅和相位值,与感知差异相关,用残余堆叠的自动编码块对声频谱图进行油漆。我们的建筑在通过跳接保护音波的光谱结构的同时,也存在一些瓶颈。我们还比较了几个任务度量,并展示了我们选择损失的视觉指南。此外,我们展示了如何利用差异量化技术将初始模型尺寸缩小一半以上,同时减少在现实世界应用中至关重要的推论时间。