Medical applications have benefited from the rapid advancement in computer vision. For patient monitoring in particular, in-bed human posture estimation provides important health-related metrics with potential value in medical condition assessments. Despite great progress in this domain, it remains a challenging task due to substantial ambiguity during occlusions, and the lack of large corpora of manually labeled data for model training, particularly with domains such as thermal infrared imaging which are privacy-preserving, and thus of great interest. Motivated by the effectiveness of self-supervised methods in learning features directly from data, we propose a multi-modal conditional variational autoencoder (MC-VAE) capable of reconstructing features from missing modalities seen during training. This approach is used with HRNet to enable single modality inference for in-bed pose estimation. Through extensive evaluations, we demonstrate that body positions can be effectively recognized from the available modality, achieving on par results with baseline models that are highly dependent on having access to multiple modes at inference time. The proposed framework supports future research towards self-supervised learning that generates a robust model from a single source, and expects it to generalize over many unknown distributions in clinical environments.
翻译:医疗应用得益于计算机视野的快速进步。特别是,对于病人的监测而言,嵌入式人类态势估计提供了在健康状况评估中具有潜在价值的重要健康相关计量标准。尽管在这一领域取得了巨大进展,但由于在隔离期间存在很大的模糊性,以及缺乏大量手工标签数据供模型培训使用,特别是保护隐私并因此引起极大兴趣的热红外成像等领域,医疗应用得益于计算机的快速进步。受直接从数据中学习特征的自我监督方法的有效性的驱动,我们提出了能够从培训期间看到的缺失模式中重建特征的多式有条件自动变异化器(MC-VAE)的建议。这一方法与HRNet一起使用,使单一种模式的推断能够用于床面面面面面面面的估测。通过广泛的评估,我们证明身体位置可以有效地从现有模式中得到承认,与高度依赖在推算时使用多种模式的基线模型取得相同的结果。拟议框架支持今后开展研究,以自控式的自控式学习,从单一来源产生健全的模型,并期望它能够将许多未知的临床环境中的分布情况普遍化。