Most existing cry detection models have been tested with data collected in controlled settings. Thus, the extent to which they generalize to noisy and lived environments is unclear. In this paper, we evaluate several established machine learning approaches including a model leveraging both deep spectrum and acoustic features. This model was able to recognize crying events with F1 score 0.613 (Precision: 0.672, Recall: 0.552), showing improved external validity over existing methods at cry detection in everyday real-world settings. As part of our evaluation, we collect and annotate a novel dataset of infant crying compiled from over 780 hours of labeled real-world audio data, captured via recorders worn by infants in their homes, which we make publicly available. Our findings confirm that a cry detection model trained on in-lab data underperforms when presented with real-world data (in-lab test F1: 0.656, real-world test F1: 0.236), highlighting the value of our new dataset and model.
翻译:多数现有的哭泣探测模型已经用在受控环境中收集的数据进行了测试,因此,它们在多大程度上被概括到吵闹和活生生的环境还不清楚。在本文中,我们评估了几种成熟的机器学习方法,包括利用深频谱和声学特征的模型。这一模型以F1分0.613(精确度:0.672,回想:0.552)识别了哭泣事件,表明在日常现实世界环境中发现哭泣的现有方法的外部有效性有所提高。作为我们评估的一部分,我们收集并点出了一套婴儿哭泣的新数据集,该数据集来自780多小时的贴标签真实世界的音频数据,通过婴儿在家里穿戴的录音机采集,我们公开提供这些数据。我们的调查结果证实,在用现实世界数据(在实验室中测试F1:0.656,真实世界测试F1:0.236)展示时,对实验室内数据不完美的数据进行了培训。我们的新数据集和模型的价值突出。