This study introduces an online target sound extraction (TSE) process using the similarity-and-independence-aware beamformer (SIBF) derived from an iterative batch algorithm. The study aimed to reduce latency while maintaining extraction accuracy. The SIBF, which is a linear method, provides more accurate estimates of the target than an approximate magnitude spectrogram reference. The transition to an online algorithm reduces latency but presents challenges. First, contrary to the conventional assumption, deriving the online algorithm may degrade accuracy as compared to the batch algorithm using a sliding window. Second, conventional post-processing methods intended for scaling the estimated target may widen the accuracy gap between the two algorithms. This study adopts an approach that addresses these challenges and minimizes the accuracy gap during post-processing. It proposes a novel scaling method based on the single-channel Wiener filter (SWF-based scaling). To further improve accuracy, the study introduces a modified version of the time-frequency-varying variance generalized Gaussian distribution as a source model to represent the joint probability between the target and reference. Experimental results using the CHiME-3 dataset demonstrate several key findings: 1) SWF-based scaling effectively eliminates the gap between the two algorithms and improves accuracy. 2) The new source model achieves optimal accuracy, corresponding to the Laplacian model. 3) Our online SIBF outperforms conventional linear TSE methods, including independent vector extraction and minimum mean square error beamforming. These findings can contribute to the fields of beamforming and blind source separation.
翻译:暂无翻译