This paper develops a framework that can perform denoising, dereverberation, and source separation accurately by using a relatively small number of microphones. It has been empirically confirmed that Independent Vector Analysis (IVA) can blindly separate $N$ sources from their sound mixture even with diffuse noise when a sufficiently large number ($=M$) of microphones are available (i.e., $M\gg N)$. However, the estimation accuracy seriously degrades as the number of microphones, or more specifically $M-N$ $(\ge 0)$, decreases. To overcome this limitation of IVA, we propose switching IVA (swIVA) in this paper. With swIVA, time frames of an observed signal with time-varying characteristics are clustered into several groups, each of which can be well handled by IVA using a small number of microphones, and thus accurate estimation can be achieved by applying {\IVA} individually to each of the groups. Conventionally, a switching mechanism was introduced into a beamformer; however, no blind source separation algorithms with a switching mechanism have been successfully developed until this paper. In order to incorporate dereverberation capability, this paper further extends swIVA to blind Convolutional beamforming algorithm (swCIVA). It integrates swIVA and switching Weighted Prediction Error-based dereverberation (swWPE) in a jointly optimal way. We show that both swIVA and swIVAconv can be optimized effectively based on blind signal processing, and that their performance can be further improved using a spatial guide for the initialization. Experiments show that the both proposed methods largely outperform conventional IVA and its Convolutional beamforming extension (CIVA) in terms of objective signal quality and automatic speech recognition scores when using a relatively small number of microphones.
翻译:本文开发了一个框架, 可以通过使用数量相对较少的麦克风来进行贬低、 贬低和源分离。 经验性地证实, 独立矢量分析( IVA) 可以盲目地将美元源与其声音混合物分离, 即使当有足够多的麦克风( 即 $M=M$) 时, 分散噪音( 即 $M\ gg N) 。 然而, 估计准确性会随着麦克风的数量而严重下降, 或更具体地说, 降为美元- 美元( ge0) 。 为了克服IVA的这一限制, 我们提议在本文件中转换IVA( swIVA) 。 通过 SwIVA, 观察到的自动自动自动信号的时间框架会聚集到几个组中, 每一个由IVA使用少量的麦克风来处理, 从而可以通过对每个组分别应用 & IVA 数字来实现准确性估算 。 我们从公约的角度引入了一个转换机制; 但是, 与转换机源的解算法在很大程度上, 已经成功地开发了, 初步的转换机制, 将A 方向的初始的信号处理, 并显示 方向的转换结果,, 将显示 更新 速度 。 运行 显示 显示, 显示 速度 速度 速度 显示, 显示 速度 速度 速度 速度 显示 显示 速度 速度, 显示 显示 速度 速度 速度 速度 速度,,, 显示 显示 显示 速度 速度 速度 显示 速度 速度 速度 速度 速度 速度, 显示, 显示 显示 显示 速度 。