Audio super resolution aims to predict the missing high resolution components of the low resolution audio signals. While audio in nature is continuous signal, current approaches treat it as discrete data (i.e., input is defined on discrete time domain), and consider the super resolution over fixed scale factor (i.e., it is required to train a new neural network to change output resolution). To obtain a continuous representation of audio and enable super resolution for arbitrary scale factor, we propose a method of neural implicit representation, coined Local Implicit representation for Super resolution of Arbitrary scale (LISA). Our method locally parameterizes a chunk of audio as a function of continuous time, and represents each chunk with the local latent codes of neighboring chunks so that the function can extrapolate the signal at any time coordinate, i.e., infinite resolution. To learn a continuous representation for audio, we design a self-supervised learning strategy to practice super resolution tasks up to the original resolution by stochastic selection. Our numerical evaluation shows that LISA outperforms the previous fixed-scale methods with a fraction of parameters, but also is capable of arbitrary scale super resolution even beyond the resolution of training data.
翻译:超音频超分辨率旨在预测低分辨率音频信号缺失的高分辨率组成部分。 虽然音频性质是连续信号, 但当前的方法将它作为离散数据( 即输入被定义在离散时间域), 并考虑超过固定比例的超级分辨率( 即需要训练一个新的神经网络来改变输出分辨率)。 要获得连续的音频代表, 并允许任意比例系数的超级分辨率, 我们建议一种神经隐含代表方法, 任意比例的超级分辨率( LISA) 的本地隐含代表 。 我们的方法将一组音频参数作为连续时间函数, 并代表每个块的相邻区块的本地潜在代码, 以便该功能可以在任何时间对信号进行外推导, 即无限分辨率 。 要对音频进行连续的表达, 我们设计一个自我监督的学习策略, 通过随机选择, 将超级分辨率的任务应用到原始分辨率。 我们的数值评估显示, LISA 将先前的固定尺度方法比部分参数高, 但也能够任意的超级分辨率, 甚至超出培训分辨率的分辨率 。