Acoustic recognition has emerged as a prominent task in deep learning research, frequently utilizing spectral feature extraction techniques such as the spectrogram from the Short-Time Fourier Transform and the scalogram from the Wavelet Transform. However, there is a notable deficiency in studies that comprehensively discuss the advantages, drawbacks, and performance comparisons of these methods. This paper aims to evaluate the characteristics of these two transforms as input data for acoustic recognition using Convolutional Neural Networks. The performance of the trained models employing both transforms is documented for comparison. Through this analysis, the paper elucidates the advantages and limitations of each method, provides insights into their respective application scenarios, and identifies potential directions for further research.
翻译:暂无翻译