Implicit Neural Representations (INRs) are nowadays used to represent multimedia signals across various real-life applications, including image super-resolution, image compression, or 3D rendering. Existing methods that leverage INRs are predominantly focused on visual data, as their application to other modalities, such as audio, is nontrivial due to the inductive biases present in architectural attributes of image-based INR models. To address this limitation, we introduce HyperSound, the first meta-learning approach to produce INRs for audio samples that leverages hypernetworks to generalize beyond samples observed in training. Our approach reconstructs audio samples with quality comparable to other state-of-the-art models and provides a viable alternative to contemporary sound representations used in deep neural networks for audio processing, such as spectrograms.
翻译:目前使用隐性神经图示(INRs)代表各种真实应用的多媒体信号,包括图像超分辨率、图像压缩或3D成像等。 利用现有方法,利用IRS主要侧重于视觉数据,因为它们在音频等其他模式中的应用是非技术性的,因为基于图像的IRS模型的建筑特征存在感应偏见。 为解决这一限制,我们引入了超声波,这是为音频样本制作IRS的第一个元学习方法,利用超网络对培训中观察到的样品进行普及。 我们的方法重建了质量与其他最先进的模型相似的音频样品,并为用于音频处理的深神经网络(如光谱仪)的当代声音展示提供了一个可行的替代方法。