非营养性吸吮动作识别和分割的基于视频的端到端管道() 我们提出了一种基于端到端计算机视觉管道，利用现成的婴儿监视器视频片段来检测非营养性吸吮(NNS) - 一种没有提供营养的婴儿吸吮模式，并将其作为潜在的发育迟缓生物标志物。对NNS进行临床(或算法)评估的一个障碍是其稀疏性，需要专家浏览数小时的片段才能找到几分钟的相关活动。我们的NNS活动分割算法通过识别具有高确定性的NNS时间段来解决这个问题 - 在30个异构60秒片段中，从我们手动注释的NNS临床婴儿监视器数据集中提取，达到了94.0%的平均精度和84.9%的平均召回率。我们的方法基于一个基础的NNS动作识别算法，它使用时空深度学习网络和婴儿特定的姿态估计，在960个2.5秒的平衡NNS与非NNS片段的二元分类中达到了94.9%的准确率。在我们的第二个独立公共的NNS现场数据集上进行测试，NNS识别分类达到92.3%的准确率，而NNS分割则实现了90.8%的精度和84.2%的召回率。 (A Video-based End-to-end Pipeline for Non-nutritive Sucking Action Recognition and Segmentation in Young Infants)

翻译：非营养性吸吮动作识别和分割的基于视频的端到端管道() 我们提出了一种基于端到端计算机视觉管道，利用现成的婴儿监视器视频片段来检测非营养性吸吮(NNS) - 一种没有提供营养的婴儿吸吮模式，并将其作为潜在的发育迟缓生物标志物。对NNS进行临床(或算法)评估的一个障碍是其稀疏性，需要专家浏览数小时的片段才能找到几分钟的相关活动。我们的NNS活动分割算法通过识别具有高确定性的NNS时间段来解决这个问题 - 在30个异构60秒片段中，从我们手动注释的NNS临床婴儿监视器数据集中提取，达到了94.0%的平均精度和84.9%的平均召回率。我们的方法基于一个基础的NNS动作识别算法，它使用时空深度学习网络和婴儿特定的姿态估计，在960个2.5秒的平衡NNS与非NNS片段的二元分类中达到了94.9%的准确率。在我们的第二个独立公共的NNS现场数据集上进行测试，NNS识别分类达到92.3%的准确率，而NNS分割则实现了90.8%的精度和84.2%的召回率。

Shaotong Zhu,Michael Wan,Elaheh Hatamimajoumerd,Kashish Jain,Samuel Zlota,Cholpady Vikram Kamath,Cassandra B. Rowan,Emma C. Grace,Matthew S. Goodwin,Marie J. Hayes,Rebecca A. Schwartz-Mette,Emily Zimmerman,Sarah Ostadabbas

We present an end-to-end computer vision pipeline to detect non-nutritive sucking (NNS) -- an infant sucking pattern with no nutrition delivered -- as a potential biomarker for developmental delays, using off-the-shelf baby monitor video footage. One barrier to clinical (or algorithmic) assessment of NNS stems from its sparsity, requiring experts to wade through hours of footage to find minutes of relevant activity. Our NNS activity segmentation algorithm solves this problem by identifying periods of NNS with high certainty -- up to 94.0\% average precision and 84.9\% average recall across 30 heterogeneous 60 s clips, drawn from our manually annotated NNS clinical in-crib dataset of 183 hours of overnight baby monitor footage from 19 infants. Our method is based on an underlying NNS action recognition algorithm, which uses spatiotemporal deep learning networks and infant-specific pose estimation, achieving 94.9\% accuracy in binary classification of 960 2.5 s balanced NNS vs. non-NNS clips. Tested on our second, independent, and public NNS in-the-wild dataset, NNS recognition classification reaches 92.3\% accuracy, and NNS segmentation achieves 90.8\% precision and 84.2\% recall.

翻译：基于视频的端到端管道用于识别和分割非营养性吸吮动作，这是一种没有提供营养的婴儿吸吮模式，通过使用现成的婴儿监视器视频片段来作为特征。我们使用了基于深度学习的时空网络和婴儿特定姿态估计的算法来实现NNS活动的自动分割，并开展了相关研究，其中我们在一个基于临床的手动注释数据集上进行了测试，该数据集包含了来自19个婴儿的183个小时的夜间婴儿监视器视频片段。我们的实验表明，我们的算法对NNS的自动识别和自动分割都带有很高的准确性，其中NNS自动分割算法平均精度为94.0%，平均召回率为84.9%；NNS自动识别算法的准确率达到了94.9%；在另一个独立的公共数据集上，NNS自动识别分类的准确率为92.3%，NNS分割的精度和召回率分别达到了90.8%和84.2%。