The objective of this work is to segment human body parts from egocentric video using semantic segmentation networks. Our contribution is two-fold: i) we create a semi-synthetic dataset composed of more than 15, 000 realistic images and associated pixel-wise labels of egocentric human body parts, such as arms or legs including different demographic factors; ii) building upon the ThunderNet architecture, we implement a deep learning semantic segmentation algorithm that is able to perform beyond real-time requirements (16 ms for 720 x 720 images). It is believed that this method will enhance sense of presence of Virtual Environments and will constitute a more realistic solution to the standard virtual avatars.
翻译:这项工作的目标是利用语义分解网络将人体器官从以自我为中心的视频中分离出来。我们的贡献有两个方面:一)我们创建了一个半合成数据集,由15 000多张现实图像和相关的以自我为中心的人体器官像素标签组成,如手臂或腿,包括不同的人口因素;二)在雷电网结构的基础上,我们实施一个能够超越实时要求(720x720图像16 ms)的深学习语义分解算法。 我们相信,这种方法将增强虚拟环境的存在感,并将构成标准虚拟虚拟变异体的更现实的解决办法。