Movement and pose assessment of newborns lets experienced pediatricians predict neurodevelopmental disorders, allowing early intervention for related diseases. However, most of the newest AI approaches for human pose estimation methods focus on adults, lacking publicly benchmark for infant pose estimation. In this paper, we fill this gap by proposing infant pose dataset and Deep Aggregation Vision Transformer for human pose estimation, which introduces a fast trained full transformer framework without using convolution operations to extract features in the early stages. It generalizes Transformer + MLP to high-resolution deep layer aggregation within feature maps, thus enabling information fusion between different vision levels. We pre-train AggPose on COCO pose dataset and apply it on our newly released large-scale infant pose estimation dataset. The results show that AggPose could effectively learn the multi-scale features among different resolutions and significantly improve the performance of infant pose estimation. We show that AggPose outperforms hybrid model HRFormer and TokenPose in the infant pose estimation dataset. Moreover, our AggPose outperforms HRFormer by 0.7% AP on COCO val pose estimation on average. Our code is available at github.com/SZAR-LAB/AggPose.
翻译:在本文中,我们通过提出婴儿构成数据集和深度聚合愿景变异器来填补这一空白,为人体构成估计数据集。我们提出婴儿构成数据集和深度聚合愿景变异器来进行人类构成估计,引入了一个经过快速培训的全变压器框架,而没有利用演动操作来提取早期的特征。我们把变异器+ MLP 概括为地貌图内高分辨率的深层集成,从而能够在不同的视觉水平之间汇集信息。我们关于COCOCO的AggPose 预示式AggPose 生成数据集并将其应用于我们新发布的大规模婴儿构成估计数据集。结果显示,AggPose 能够有效地学习不同分辨率的多尺度特征,并大大改善婴儿构成估计的性能。我们显示,AggPose 超越了混合模型 HRFormer 和 TokenPose 在婴儿体内的TokenPose 构成估计数据集。此外,我们关于CO-APR AS AS 的 AS AS AS AS AS AS AL AS AS AS ASUL AS AS AS ASULAUL AS AL AS AS AS AS AS ASUMEUME AS AS AS AS AS AL AL AL AL AL AL AL AL AL AL AL AL AL AL AL AL AS AS AS AS AS AS AS AS AS AS AL AL AL AL AS AL AL AL AL AL AL AL AL AL AL AL AL AL AL AL AL AL AL AL AL AL AL AL AL AL AL AS AL AL AL AL AL AL AL AL AL AL AL AL AL AL AL AL AL AL AL AL AL AL AL AL AL AL AL AL AL AL AL AL AL AL AL AL AS AL AL AL AL AL AL AL AL AL AL AL AL AL AL AL AL AL AL AL