We propose a Beamformer-guided Target Speaker Extraction (BG-TSE) method to extract a target speaker's voice from a multi-channel recording informed by the direction of arrival of the target. The proposed method employs a front-end beamformer steered towards the target speaker to provide an auxiliary signal to a single-channel TSE system. By allowing for time-varying embeddings in the single-channel TSE block, the proposed method fully exploits the correspondence between the front-end beamformer output and the target speech in the microphone signal. Experimental evaluation on simulated multi-channel 2-speaker mixtures, in both anechoic and reverberant conditions, demonstrates the advantage of the proposed method compared to recent single-channel and multi-channel baselines.
翻译:我们建议采用Beamold-制导目标发言人抽取(BG-TSE)方法,从目标到达方向所了解的多频道录音中提取目标发言者的声音。拟议方法使用前端信号,向目标发言者方向引导,为单一频道TSE系统提供辅助信号。拟议方法允许在单一频道TSE区块内进行时间变化嵌入,充分利用了前端信号波段输出与麦克风信号中目标讲话之间的通信。对模拟多频道2频道混合物的实验性评价,无论是在厌食和回旋条件下,都显示了拟议方法与最近的单一频道和多频道基线相比的优势。</s>