This paper proposes new acoustic feature signatures based on the multiscale fractal dimension (MFD), which are robust against the diversity of environmental sounds, for the content-based similarity search. The diversity of sound sources and acoustic compositions is a typical feature of environmental sounds. Several acoustic features have been proposed for environmental sounds. Among them is the widely-used Mel-Frequency Cepstral Coefficients (MFCCs), which describes frequency-domain features. However, in addition to these features in the frequency domain, environmental sounds have other important features in the time domain with various time scales. In our previous paper, we proposed enhanced multiscale fractal dimension signature (EMFD) for environmental sounds. This paper extends EMFD by using the kernel density estimation method, which results in better performance of the similarity search tasks. Furthermore, it newly proposes another acoustic feature signature based on MFD, namely very-long-range multiscale fractal dimension signature (MFD-VL). The MFD-VL signature describes several features of the time-varying envelope for long periods of time. The MFD-VL signature has stability and robustness against background noise and small fluctuations in the parameters of sound sources, which are produced in field recordings. We discuss the effectiveness of these signatures in the similarity sound search by comparing with acoustic features proposed in the DCASE 2018 challenges. Due to the unique descriptiveness of our proposed signatures, we confirmed the signatures are effective when they are used with other acoustic features.
翻译:本文提出基于多尺度分形维度(MFD)的新的声学特征,这些特征是针对不同环境声音的,是针对不同时间尺度的。在先前的文件中,我们建议加强环境声音的多尺度分解维度(EMFD)特征。声源和声学构成的多样性是环境声音的典型特征。为环境声音提出了若干声学特征,其中包括广泛使用的Mel-Forquance Cepstraal Covales(MFCCs)特征,描述频率-多尺度的频率-特性。然而,除了频率域中的这些特征外,环境声音在时间范围内具有其他重要特征。在先前的文件中,我们建议对环境声音加强多尺度的分解维度特征(EMFD)特征(EMFD) 。本文通过使用更精确密度估计方法扩展EMDFD,从而更好地执行相似的搜索任务。此外,本文件还提出了另一个基于超大型多尺度的多尺度分维维维维度签名(MFD-VL) 。MFD-VL的签名描述了我们长期间时间折叠曲的多个时间封装的特征。MFDFD-VD-VL参数,我们使用了这些稳定的地面特征的精确特征,我们使用了这些稳定的实地记录记录。