Unconditional human image generation is an important task in vision and graphics, which enables various applications in the creative industry. Existing studies in this field mainly focus on "network engineering" such as designing new components and objective functions. This work takes a data-centric perspective and investigates multiple critical aspects in "data engineering", which we believe would complement the current practice. To facilitate a comprehensive study, we collect and annotate a large-scale human image dataset with over 230K samples capturing diverse poses and textures. Equipped with this large dataset, we rigorously investigate three essential factors in data engineering for StyleGAN-based human generation, namely data size, data distribution, and data alignment. Extensive experiments reveal several valuable observations w.r.t. these aspects: 1) Large-scale data, more than 40K images, are needed to train a high-fidelity unconditional human generation model with vanilla StyleGAN. 2) A balanced training set helps improve the generation quality with rare face poses compared to the long-tailed counterpart, whereas simply balancing the clothing texture distribution does not effectively bring an improvement. 3) Human GAN models with body centers for alignment outperform models trained using face centers or pelvis points as alignment anchors. In addition, a model zoo and human editing applications are demonstrated to facilitate future research in the community.
翻译:人类不附带条件的图像生成是视觉和图形方面的一项重要任务,它使得创造性行业的各种应用成为了重要的任务。该领域的现有研究主要侧重于“网络工程”,例如设计新的组件和客观功能。这项工作从数据中心的角度出发,调查“数据工程”的多个关键方面,我们认为这将补充当前的做法。为了便利全面研究,我们收集并注释一个大型人类图像数据集,其中有230K多个样本,捕捉了各种配置和纹理。用这个大型数据集,我们严格调查StelegGAN型人类生成数据工程的三个基本因素,即数据大小、数据分布和数据协调。广泛的实验揭示了几个有价值的观测 w.r.t。这些方面:(1) 大型数据,40多K图像,需要用来用香草风格GAN来培训高密度、无条件的人类生成模型。(2) 平衡的培训组合有助于提高生成质量,与长式对口面面面面面面面面面面面面质相比,而只是平衡服装文本发布并不能有效地改进。(3) 人类GAN型模型与机构中心相比,使用经过训练的固定式模型,促进未来空间调整的模型。