Domain mismatch problem caused by speaker-unrelated feature has been a major topic in speaker recognition. In this paper, we propose an explicit disentanglement framework to unravel speaker-relevant features from speaker-unrelated features via mutual information (MI) minimization. To achieve our goal of minimizing MI between speaker-related and speaker-unrelated features, we adopt a contrastive log-ratio upper bound (CLUB), which exploits the upper bound of MI. Our framework is constructed in a 3-stage structure. First, in the front-end encoder, input speech is encoded into shared initial embedding. Next, in the decoupling block, shared initial embedding is split into separate speaker-related and speaker-unrelated embeddings. Finally, disentanglement is conducted by MI minimization in the last stage. Experiments on Far-Field Speaker Verification Challenge 2022 (FFSVC2022) demonstrate that our proposed framework is effective for disentanglement. Also, to utilize domain-unknown datasets containing numerous speakers, we pre-trained the front-end encoder with VoxCeleb datasets. We then fine-tuned the speaker embedding model in the disentanglement framework with FFSVC 2022 dataset. The experimental results show that fine-tuning with a disentanglement framework on a existing pre-trained model is valid and can further improve performance.
翻译:由与发言者无关的特性造成的错配问题一直是发言者认识的一个主要主题。 在本文中,我们提议一个明确的分解框架,通过相互信息(MI),将与发言者无关的特性从与发言者无关的特性中分离出来。为了实现在与发言者有关的特性和与发言者无关的特性之间最小化管理管理目标,我们采用了一个对比性对齐的log-ratio上捆绑(CLUB),它利用了管理信息系统的上层界限。我们的框架是在一个三阶段结构中构建的。首先,在前端编码中,输入的演讲被编码成共享初始嵌入。接下来,在分解的块中,共享的初步嵌入将分裂成与发言者有关和与发言者无关的特性。最后,我们通过管理最小化管理来进行分解。在远方发言人核查挑战2022(FFSVC2022)上进行的实验表明,我们提议的框架可以有效地解乱交。此外,我们利用由众多发言者组成的域未知的有效数据集,我们预先用VoxCeeleleb数据库对前端的模型进行了培训。我们随后对Voxeleleb数据框架进行了精确化,然后将20-Cmagredustradd the the ladestrut the dalfroduction laction ladestration the flaction laction lactional lactional laction lactionalmadal