The main theme of this paper is to reconstruct audio signal from interrupted measurements. We present a light-weighted model only consisting discrete Fourier transform and Convolutional-based Autoencoder model (ConvAE), called the FFT-ConvAE model for the Helsinki Speech Challenge 2024. The FFT-ConvAE model is light-weighted (in terms of real-time factor) and efficient (in terms of character error rate), which was verified by the organizers. Furthermore, the FFT-ConvAE is a general-purpose model capable of handling all tasks with a unified configuration.
翻译:暂无翻译