As the COVID-19 pandemic rampages across the world, the demands of video conferencing surge. To this end, real-time portrait segmentation becomes a popular feature to replace backgrounds of conferencing participants. While feature-rich datasets, models and algorithms have been offered for segmentation that extract body postures from life scenes, portrait segmentation has yet not been well covered in a video conferencing context. To facilitate the progress in this field, we introduce an open-source solution named PP-HumanSeg. This work is the first to construct a large-scale video portrait dataset that contains 291 videos from 23 conference scenes with 14K fine-labeled frames and extensions to multi-camera teleconferencing. Furthermore, we propose a novel Semantic Connectivity-aware Learning (SCL) for semantic segmentation, which introduces a semantic connectivity-aware loss to improve the quality of segmentation results from the perspective of connectivity. And we propose an ultra-lightweight model with SCL for practical portrait segmentation, which achieves the best trade-off between IoU and the speed of inference. Extensive evaluations on our dataset demonstrate the superiority of SCL and our model. The source code is available at https://github.com/PaddlePaddle/PaddleSeg.
翻译:随着全世界COVID-19大流行大肆流行,视像会议需求激增。为此,实时肖像截图成为取代会议参与者背景的流行特征,取代会议参与者背景。虽然为从生活场景中提取身体姿势的分解提供了具有地貌丰富的数据集、模型和算法,但肖像截图尚未在电视会议背景下充分覆盖。为了便利这个领域的进展,我们引入了一个名为PP-HulmanSeg的开放源解决方案。这是首次构建一个大型视频肖像数据集,其中包含23个会议场景的291个视频,配有14K的精密标签框架和多摄像头电话会议扩展的扩展。此外,我们提议为静音分解提供一个新型的语义连通性-觉学习(SCL),从连通角度引入一个语义连通性-觉损失,以提高分解质量。我们还提议一个超轻量模型,由SCL进行实际肖像分解,实现IOU与PA的顶值交易,并实现多摄像的速。我们的数据源/PAB/Sdeldeal的高级评估。我们的数据源/SBebal/Clabs/C。