We present an end-to-end computer vision pipeline to detect non-nutritive sucking (NNS) -- an infant sucking pattern with no nutrition delivered -- as a potential biomarker for developmental delays, using off-the-shelf baby monitor video footage. One barrier to clinical (or algorithmic) assessment of NNS stems from its sparsity, requiring experts to wade through hours of footage to find minutes of relevant activity. Our NNS activity segmentation algorithm solves this problem by identifying periods of NNS with high certainty -- up to 94.0\% average precision and 84.9\% average recall across 30 heterogeneous 60 s clips, drawn from our manually annotated NNS clinical in-crib dataset of 183 hours of overnight baby monitor footage from 19 infants. Our method is based on an underlying NNS action recognition algorithm, which uses spatiotemporal deep learning networks and infant-specific pose estimation, achieving 94.9\% accuracy in binary classification of 960 2.5 s balanced NNS vs. non-NNS clips. Tested on our second, independent, and public NNS in-the-wild dataset, NNS recognition classification reaches 92.3\% accuracy, and NNS segmentation achieves 90.8\% precision and 84.2\% recall.
翻译:基于视频的端到端管道用于识别和分割非营养性吸吮动作,这是一种没有提供营养的婴儿吸吮模式,通过使用现成的婴儿监视器视频片段来作为特征。我们使用了基于深度学习的时空网络和婴儿特定姿态估计的算法来实现NNS活动的自动分割,并开展了相关研究,其中我们在一个基于临床的手动注释数据集上进行了测试,该数据集包含了来自19个婴儿的183个小时的夜间婴儿监视器视频片段。我们的实验表明,我们的算法对NNS的自动识别和自动分割都带有很高的准确性,其中NNS自动分割算法平均精度为94.0%,平均召回率为84.9%;NNS自动识别算法的准确率达到了94.9%;在另一个独立的公共数据集上,NNS自动识别分类的准确率为92.3%,NNS分割的精度和召回率分别达到了90.8%和84.2%。