The Alberta Infant Motor Scale (AIMS) is a well-known assessment scheme that evaluates the gross motor development of infants by recording the number of specific poses achieved. With the aid of the image-based pose recognition model, the AIMS evaluation procedure can be shortened and automated, providing early diagnosis or indicator of potential developmental disorder. Due to limited public infant-related datasets, many works use the SMIL-based method to generate synthetic infant images for training. However, this domain mismatch between real and synthetic training samples often leads to performance degradation during inference. In this paper, we present a CNN-based model which takes any infant image as input and predicts the coarse and fine-level pose labels. The model consists of an image branch and a pose branch, which respectively generates the coarse-level logits facilitated by the unsupervised domain adaptation and the 3D keypoints using the HRNet with SMPLify optimization. Then the outputs of these branches will be sent into the hierarchical pose recognition module to estimate the fine-level pose labels. We also collect and label a new AIMS dataset, which contains 750 real and 4000 synthetic infants images with AIMS pose labels. Our experimental results show that the proposed method can significantly align the distribution of synthetic and real-world datasets, thus achieving accurate performance on fine-grained infant pose recognition.
翻译:艾伯塔省婴儿机动车规模(AIMS)是一个众所周知的评估计划,通过记录特定成份的数量来评估婴儿运动的毛运动发育。借助基于图像的表面识别模型,AIMS评估程序可以缩短和自动化,提供早期诊断或潜在发育紊乱的指标。由于与婴儿有关的公共数据集有限,许多作品使用基于SMIL的合成婴儿图像来生成培训用合成婴儿图像。然而,真实和合成培训样本之间的这一域错配往往导致在推论期间性能退化。本文中,我们提出了一个基于CNN的模型,将任何婴儿图像作为投入并预测粗糙和精细的表面标签。该模型包括一个图像分支和一个外形分支,分别产生粗糙水平的登录点,由未加控制的域适应所促进,以及使用带有SMPLify优化的HRNet的3D关键点来生成。然后,这些分支的产出将发送到等级化的图像识别模块中,以估计精细的形状标签。我们还收集并标贴一个新的AIMS数据集,其中含有750个真实的和4000个精细的面标签。这个模型,分别由图像组成一个图像组成,分别产生粗化的模型,从而显示我们模拟的合成婴儿的合成图像的合成图像的精确的模型,可以显示我们的模型。