This paper addresses the problem of infant cry detection in real-world settings. While most existing cry detection models have been tested with data collected in controlled settings, the extent to which they generalize to noisy and lived environments, i.e., people's homes, is unclear. In this paper, we evaluated several established machine learning-based approaches as well as a promising modeling strategy leveraging both deep spectrum and acoustic features. This model was able to recognize crying events with F1 score 0.630 (Precision: 0.697, Recall: 0.567), showing improved external validity over existing methods at cry detection in everyday real-world settings. As part of our evaluation, we collected and annotated a novel dataset of infant crying compiled from over 780 hours of high-quality labeled real-world audio data, captured via recorders worn by infants in their homes, which we make publicly available. Our findings confirmed that a cry detection model trained on in-lab data underperforms when presented with real-world data (in-lab test F1: 0.656, real-world test F1: 0.243), highlighting the value of our new dataset and model.
翻译:本文论述现实世界环境中的婴儿哭泣探测问题。 虽然大多数现有的哭泣探测模型已经用在受控环境中收集的数据进行了测试,但是它们向吵闹和活生生的环境(即人们的家)推广的程度还不清楚。在本文件中,我们评估了几个已经建立的机器学习方法,以及利用深频谱和声学特点的有希望的模型战略。这个模型能够识别F1分0.630的哭泣事件(精确度:0.697,回想起:0.567),表明在现实世界环境中哭泣检测的现有方法的外部有效性有所提高。作为我们评估的一部分,我们收集并附加了一套关于婴儿哭泣的新数据集,该数据集来自780多小时的高质量贴标签真实世界听力数据,通过婴儿在家中戴的录音机收集,我们公开提供这些数据。我们的调查结果证实,在用真实世界数据(在实验室测试F1:0.656,真实世界测试F1:0.243)时,对实验室内数据底部数据进行了培训的哭泣探测模型。我们的新数据集和模型的价值突出。