Implicit neural representations (INRs) are a rapidly growing research field, which provides alternative ways to represent multimedia signals. Recent applications of INRs include image super-resolution, compression of high-dimensional signals, or 3D rendering. However, these solutions usually focus on visual data, and adapting them to the audio domain is not trivial. Moreover, it requires a separately trained model for every data sample. To address this limitation, we propose HyperSound, a meta-learning method leveraging hypernetworks to produce INRs for audio signals unseen at training time. We show that our approach can reconstruct sound waves with quality comparable to other state-of-the-art models.
翻译:隐性神经表征(INRs)是一个迅速增长的研究领域,它提供了代表多媒体信号的替代方法。IRR的近期应用包括图像超分辨率、高维信号压缩或3D投影。然而,这些解决方案通常侧重于视觉数据,并把它们调整到音频域并非微不足道。此外,它要求为每个数据样本建立一个单独培训的模型。为了应对这一限制,我们建议采用超声波,这是一种利用超网络生成在培训时看不见的音频信号的元学习方法。我们表明,我们的方法可以重建声波,其质量与其他最先进的模型相当。