Sex classification of children's voices allows for an investigation of the development of secondary sex characteristics which has been a key interest in the field of speech analysis. This research investigated a broad range of acoustic features from scripted and spontaneous speech and applied a hierarchical clustering-based machine learning model to distinguish the sex of children aged between 5 and 15 years. We proposed an optimal feature set and our modelling achieved an average F1 score (the harmonic mean of the precision and recall) of 0.84 across all ages. Our results suggest that the sex classification is generally more accurate when a model is developed for each year group rather than for children in 4-year age bands, with classification accuracy being better for older age groups. We found that spontaneous speech could provide more helpful cues in sex classification than scripted speech, especially for children younger than 7 years. For younger age groups, a broad range of acoustic factors contributed evenly to sex classification, while for older age groups, F0-related acoustic factors were found to be the most critical predictors generally. Other important acoustic factors for older age groups include vocal tract length estimators, spectral flux, loudness and unvoiced features.
翻译:儿童声音的性别分类有助于调查第二性别特征的发展,这是语言分析领域的主要兴趣之一。这一研究调查了从文字和自发演讲中产生的广泛的声学特征,并应用了一个基于分级集群的机器学习模型,以区分5至15岁儿童的性别。我们建议了一个最佳的特征集,我们的模型在所有年龄段平均达到0.84的F1分(精确和回溯的口音的调和中值)。我们的结果表明,如果为每年一组制定一种模型,而不是为四岁年龄组的儿童制定一种模型,性别分类精确度更高,则性别分类一般更为准确。我们发现,自发演讲在性别分类方面比编成的语类更有用,特别是7岁以下儿童。对于较年轻的年龄组来说,广泛的声学因素对性别分类有均衡的贡献,而对年龄较大的年龄组而言,与F0有关的声学因素一般都是最关键的预测因素。对年龄较大的年龄组来说,其他重要的声学因素包括声道测量器长度、光谱通量、音响度和不响度和不响音调特点。