We investigate applying audio manipulations using pretrained neural network-based autoencoders as an alternative to traditional signal processing methods, since the former may provide greater semantic or perceptual organization. To establish the potential of this approach, we first establish if representations from these models encode information about manipulations. We carry out experiments and produce visualizations using representations from two different pretrained autoencoders. Our findings indicate that, while some information about audio manipulations is encoded, this information is both limited and encoded in a non-trivial way. This is supported by our attempts to visualize these representations, which demonstrated that trajectories of representations for common manipulations are typically nonlinear and content dependent, even for linear signal manipulations. As a result, it is not yet clear how these pretrained autoencoders can be used to manipulate audio signals, however, our results indicate this may be due to the lack of disentanglement with respect to common audio manipulations.
翻译:我们研究了使用预训练的基于神经网络的自编码器进行音频处理的方法,作为传统信号处理方法的替代选择,因为前者可以提供更多的语义或感知组织。为了确认这种方法的潜力,我们首先确定从这些模型中获得的表示是否编码关于处理的信息。我们进行了实验并使用来自两个不同预训练自编码器的表示生成了可视化效果。我们的研究表明,虽然有关音频处理的一些信息被编码,但这些信息既有限又以非平凡的方式编码。这得到了我们可视化这些表示的尝试的支持,证明了常见处理的表示轨迹通常是非线性的且与内容相关,即使是线性信号处理也是如此。因此,目前尚不清楚如何使用这些预训练自编码器来处理音频信号,但我们的结果表明这可能是由于缺乏关于常见音频处理的解缠关系。