This paper performs comprehensive analysis on datasets for occlusion-aware face segmentation, a task that is crucial for many downstream applications. The collection and annotation of such datasets are time-consuming and labor-intensive. Although some efforts have been made in synthetic data generation, the naturalistic aspect of data remains less explored. In our study, we propose two occlusion generation techniques, Naturalistic Occlusion Generation (NatOcc), for producing high-quality naturalistic synthetic occluded faces; and Random Occlusion Generation (RandOcc), a more general synthetic occluded data generation method. We empirically show the effectiveness and robustness of both methods, even for unseen occlusions. To facilitate model evaluation, we present two high-resolution real-world occluded face datasets with fine-grained annotations, RealOcc and RealOcc-Wild, featuring both careful alignment preprocessing and an in-the-wild setting for robustness test. We further conduct a comprehensive analysis on a newly introduced segmentation benchmark, offering insights for future exploration.
翻译:本文全面分析关于封闭面部分割的数据集,这是许多下游应用的关键任务。收集和批注这类数据集耗时费时费力。虽然在合成数据生成方面做了一些努力,但数据的自然特性方面仍然较少探讨。在我们的研究中,我们提出了两种封闭性生成技术,即自然封闭面(NatOcc),用于产生高质量的自然合成隐蔽面部;随机封闭性生成(RandOcc),这是更一般的合成隐蔽数据生成方法。我们从经验上展示了这两种方法的有效性和稳健性,即使对于隐形封闭性数据也是如此。为了便利模型评估,我们提出了两种高分辨率真实世界隐蔽面部数据集,配有精细的标记,RealOcc和RealOcc-Wird。我们进行了仔细的调整前处理,以及用于稳健性测试的内在环境。我们进一步分析了新引入的分离性基准,为今后的探索提供了深刻的见解。