The decomposition of sounds into sines, transients, and noise is a long-standing research problem in audio processing. The current solutions for this three-way separation detect either horizontal and vertical structures or anisotropy and orientations in the spectrogram to identify the properties of each spectral bin and classify it as sinusoidal, transient, or noise. This paper proposes an enhanced three-way decomposition method based on fuzzy logic, enabling soft masking while preserving the perfect reconstruction property. The proposed method allows each spectral bin to simultaneously belong to two classes, sine and noise or transient and noise. Results of a subjective listening test against three other techniques are reported, showing that the proposed decomposition yields a better or comparable quality. The main improvement appears in transient separation, which enjoys little or no loss of energy or leakage from the other components and performs well for test signals presenting strong transients. The audio quality of the separation is shown to depend on the complexity of the input signal for all tested methods. The proposed method helps improve the quality of various audio processing applications. A successful implementation over a state-of-the-art time-scale modification method is reported as an example.
翻译:声音分解成正弦、瞬时和噪声是一个长期的音频处理研究问题。 目前,这种三向分离的解决方案在光谱中检测水平和垂直结构或动脉和方向,以辨别每个光谱箱的特性,将其划为正弦、瞬时或噪音。本文件建议采用基于模糊逻辑的强化的三向分解方法,使软遮盖能够同时保存完美的重建属性。拟议方法允许每个光谱箱同时属于两个类别,即正弦和噪声或瞬时和噪声。报告对另外三种技术进行主观监听测试的结果,显示提议的分解质量更好或相近。主要改进表现在瞬间分离中,因为其他部件的能量或渗漏很少或没有损失,而且测试信号显示有很强的转基因。分离的音质质量取决于所有测试方法输入信号的复杂性。拟议方法有助于改进各种音频处理应用程序的质量。所报告的时间尺度修改方法是州级成功实施的方法。