Recently, due to COVID-19 and the growing demand for remote work, video conferencing apps have become especially widespread. The most valuable features of video chats are real-time background removal and face beautification. While solving these tasks, computer vision researchers face the problem of having relevant data for the training stage. There is no large dataset with high-quality labeled and diverse images of people in front of a laptop or smartphone camera to train a lightweight model without additional approaches. To boost the progress in this area, we provide a new image dataset, EasyPortrait, for portrait segmentation and face parsing tasks. It contains 20,000 primarily indoor photos of 8,377 unique users, and fine-grained segmentation masks separated into 9 classes. Images are collected and labeled from crowdsourcing platforms. Unlike most face parsing datasets, in EasyPortrait, the beard is not considered part of the skin mask, and the inside area of the mouth is separated from the teeth. These features allow using EasyPortrait for skin enhancement and teeth whitening tasks. This paper describes the pipeline for creating a large-scale and clean image segmentation dataset using crowdsourcing platforms without additional synthetic data. Moreover, we trained several models on EasyPortrait and showed experimental results. Proposed dataset and trained models are publicly available.
翻译:最近,由于 COVID-19 和远程工作的不断增长,视频会议应用程序变得越来越普及。视频聊天的最有价值的功能是实时背景去除和人脸美化。在解决这些任务时,计算机视觉研究人员面临的问题是需要训练阶段的相关数据。目前没有一种大型数据集,其中包括高质量的标记和各种人在笔记本电脑或智能手机摄像头前的图像,以训练轻量级模型而无需使用其他方法。为了推动这一领域的进展,我们提供了一个新的图像数据集 EasyPortrait,用于肖像分割和人脸解析任务。它包含 20,000 张主要室内照片,8,377 个独特用户的图像以及分为 9 类的精细分割掩膜。图像是从众包平台收集和标记的。与大多数人脸解析数据集不同,EasyPortrait 中的胡须不被视为皮肤蒙版的一部分,并且口腔内部区域与牙齿分开。这些功能允许使用 EasyPortrait 用于皮肤增强和牙齿美白任务。本文描述了使用众包平台创建大规模和干净的图像分割数据集的流程,而无需使用其他合成数据。此外,我们对 EasyPortrait 训练了几个模型,并展示了实验结果。所提议的数据集和训练的模型是公开可用的。