Estimating age from a single speech is a classic and challenging topic. Although Label Distribution Learning (LDL) can represent adjacent indistinguishable ages well, the uncertainty of the age estimate for each utterance varies from person to person, i.e., the variance of the age distribution is different. To address this issue, we propose selective variance label distribution learning (SVLDL) method to adapt the variance of different age distributions. Furthermore, the model uses WavLM as the speech feature extractor and adds the auxiliary task of gender recognition to further improve the performance. Two tricks are applied on the loss function to enhance the robustness of the age estimation and improve the quality of the fitted age distribution. Extensive experiments show that the model achieves state-of-the-art performance on all aspects of the NIST SRE08-10 and a real-world datasets.
翻译:虽然标签分配学习(LDL)可以代表相邻的无法区分的年龄,但每种语句的年龄估计的不确定性因人而异,即年龄分布的差异不同。为了解决这一问题,我们提议有选择的差别标签分配学习(SVLDL)方法,以适应不同年龄分布的差异。此外,模型使用WavLM作为语言特征提取器,并增加了性别识别的辅助任务,以进一步提高性能。对损失功能应用了两种技巧,以加强年龄估计的稳健性,提高适合年龄分布的质量。广泛的实验显示,模型在NIST SRE08-10和真实世界数据集的各个方面都取得了最新业绩。