During the first years of life, infant vocalizations change considerably, as infants develop the vocalization skills that enable them to produce speech sounds. Characterizations based on specific acoustic features, protophone categories, or phonetic transcription are able to provide a representation of the sounds infants make at different ages and in different contexts but do not fully describe how sounds are perceived by listeners, can be inefficient to obtain at large scales, and are difficult to visualize in two dimensions without additional statistical processing. Machine-learning-based approaches provide the opportunity to complement these characterizations with purely data-driven representations of infant sounds. Here, we use spectral features extraction and unsupervised machine learning, specifically Uniform Manifold Approximation (UMAP), to obtain a novel 2-dimensional spatial representation of infant and caregiver vocalizations extracted from day-long home recordings. UMAP yields a continuous and well-distributed space conducive to certain analyses of infant vocal development. For instance, we found that the dispersion of infant vocalization acoustics within the 2-D space over a day increased from 3 to 9 months, and then decreased from 9 to 18 months. The method also permits analysis of similarity between infant and adult vocalizations, which also shows changes with infant age.
翻译:在婴儿生命的最初几年里,婴儿的声响变化很大,因为婴儿发展了能够产生语音声音的声响技能,婴儿的声响能力有了很大变化。基于特定声响特征、原声类或声音转录的特征能够在不同年龄和不同背景下提供婴儿声音的描述,但不能充分描述听众如何感知声音,如何在广大范围内获得声音效率低下,难以在两个层面进行视觉化,而没有额外的统计处理。基于机器的学习方法为这些描述提供了补充机会,用纯数据驱动的婴儿声音表达方式来补充这些描述。在这里,我们使用光谱特征提取和不受监督的机器学习,特别是统一Manicide Accessimation(UMAP),以获得婴儿和护理人从一天家庭录音中提取的声音的新型二维空间代表。UMAP产生一个连续和分布良好的空间,有利于对婴儿声音发展进行某些分析。例如,我们发现2D空间内婴儿声响音声响声音的传播从3个月增加到9个月,然后又从婴儿年龄到18个月不等。