Face parsing is defined as the per-pixel labeling of images containing human faces. The labels are defined to identify key facial regions like eyes, lips, nose, hair, etc. In this work, we make use of the structural consistency of the human face to propose a lightweight face-parsing method using a Local Implicit Function network, FP-LIIF. We propose a simple architecture having a convolutional encoder and a pixel MLP decoder that uses 1/26th number of parameters compared to the state-of-the-art models and yet matches or outperforms state-of-the-art models on multiple datasets, like CelebAMask-HQ and LaPa. We do not use any pretraining, and compared to other works, our network can also generate segmentation at different resolutions without any changes in the input resolution. This work enables the use of facial segmentation on low-compute or low-bandwidth devices because of its higher FPS and smaller model size.
翻译:人脸分割的参数高效局部隐式图像函数网络
人脸分割是指对包含人类面部的图像进行像素级标记。标记用于识别关键面部区域,如眼睛、嘴唇、鼻子、头发等。在这项工作中,我们利用人脸的结构一致性,提出了一种轻量级的面部分割方法,使用局部隐式函数网络FP-LIIF。我们提出了一种简单的架构,具有卷积编码器和像素MLP解码器,与最先进的模型相比,使用了1/26的参数数量,同时在多个数据集上达到或超过最先进的模型,如CelebAMask-HQ和LaPa。我们不使用任何预训练,并且与其他作品相比,我们的网络还可以在不改变输入分辨率的情况下以不同的分辨率生成分割。该工作使得面部分割可在低计算或低带宽设备上使用,因为其帧速率更高、模型大小更小。