Automatic speaker verification has achieved remarkable progress in recent years. However, there is little research on cross-age speaker verification (CASV) due to insufficient relevant data. In this paper, we mine cross-age test sets based on the VoxCeleb dataset and propose our age-invariant speaker representation(AISR) learning method. Since the VoxCeleb is collected from the YouTube platform, the dataset consists of cross-age data inherently. However, the meta-data does not contain the speaker age label. Therefore, we adopt the face age estimation method to predict the speaker age value from the associated visual data, then label the audio recording with the estimated age. We construct multiple Cross-Age test sets on VoxCeleb (Vox-CA), which deliberately select the positive trials with large age-gap. Also, the effect of nationality and gender is considered in selecting negative pairs to align with Vox-H cases. The baseline system performance drops from 1.939\% EER on the Vox-H test set to 10.419\% on the Vox-CA20 test set, which indicates how difficult the cross-age scenario is. Consequently, we propose an age-decoupling adversarial learning (ADAL) method to alleviate the negative effect of the age gap and reduce intra-class variance. Our method outperforms the baseline system by over 10\% related EER reduction on the Vox-CA20 test set. The source code and trial resources are available on https://github.com/qinxiaoyi/Cross-Age_Speaker_Verification
翻译:由于相关数据不足,因此对跨年龄发言者核查(CASV)的研究很少。在本文中,我们根据VoxCeleb(Vox-Eleb-CA)数据集开采跨年龄测试组,并提议我们的年龄变化式发言者代表(AISR)学习方法。由于从YouTube平台收集了VoxCeleb, 数据集本身就包含交叉年龄数据。然而,元数据并不包含语龄标签。因此,我们采用面年龄估计法,从相关视觉数据预测语龄值,然后用估计年龄标出录音。我们在Vox-Celeb(Vox-CAeb)上安装了多个跨年龄测试组,这些测试组有意选择使用大年龄变化式发言者代表(AISR)的正面测试组。此外,在选择与Vox-H案例一致的负对等配对时,也考虑到国籍和性别的影响。在Vox-H-H(Vox-H) 现有语言-H(EER-EER)测试组中,基准系统性表现下降至C-2019-C20(Vox-C20)测试集测试组,显示跨年龄-CAAAAADADA)的反基级测试情景的难度有多困难度。因此,我们提出一个年龄年龄变换变法。我们学习标准标准。我们提出的降低标准。