In this work we present our real-time egocentric body segmentation algorithm. Our algorithm achieves a frame rate of 66 fps for an input resolution of 640x480, thanks to our shallow network inspired in Thundernet's architecture. Besides, we put a strong emphasis on the variability of the training data. More concretely, we describe the creation process of our Egocentric Bodies (EgoBodies) dataset, composed of almost 10,000 images from three datasets, created both from synthetic methods and real capturing. We conduct experiments to understand the contribution of the individual datasets; compare Thundernet model trained with EgoBodies with simpler and more complex previous approaches and discuss their corresponding performance in a real-life setup in terms of segmentation quality and inference times. The described trained semantic segmentation algorithm is already integrated in an end-to-end system for Mixed Reality (MR), making it possible for users to see his/her own body while being immersed in a MR scene.
翻译:在这项工作中,我们展示了我们实时的以自我为中心的身体分解算法。我们的算法实现了66英尺的框速率,输入分辨率为640x480。这要归功于我们从闪电网的架构中得到启发的浅网络。此外,我们大力强调培训数据的变异性。更具体地说,我们描述了我们的以地球为中心的机构(EgoBodies)数据集的创建过程,该数据集由三个数据集的近10 000张图像组成,这些图像来自合成方法和真实捕获。我们进行了实验,以了解单个数据集的贡献;我们用以前与EgoBodies培训过的闪电网模型比较了更简单、更复杂的方法,并在分解质量和推论时间方面讨论它们相应的真实性功能。所描述的经过训练的语义分解算法已经纳入混合真实(MR)的端对端系统,使用户能够看到自己的身体,同时被浸泡在MR的场景象中。