Recent work has shown the benefits of synthetic data for use in computer vision, with applications ranging from autonomous driving to face landmark detection and reconstruction. There are a number of benefits of using synthetic data from privacy preservation and bias elimination to quality and feasibility of annotation. Generating human-centered synthetic data is a particular challenge in terms of realism and domain-gap, though recent work has shown that effective machine learning models can be trained using synthetic face data alone. We show that this can be extended to include the full body by building on the pipeline of Wood et al. to generate synthetic images of humans in their entirety, with ground-truth annotations for computer vision applications. In this report we describe how we construct a parametric model of the face and body, including articulated hands; our rendering pipeline to generate realistic images of humans based on this body model; an approach for training DNNs to regress a dense set of landmarks covering the entire body; and a method for fitting our body model to dense landmarks predicted from multiple views.
翻译:最近的工作显示了用于计算机视觉的合成数据的好处,其应用范围从自主驱动到面对里程碑式的探测和重建等各种应用。使用合成数据有许多好处,从隐私保护和消除偏见到说明质量和可行性。产生以人为中心的合成数据在现实主义和领域差距方面是一个特别的挑战,尽管最近的工作表明,有效的机器学习模型可以仅使用合成面部数据进行培训。我们表明,这可以通过在Wood等人的管道上建立整个人体的合成图像,并配有计算机视觉应用的地面真相说明,扩大到包括整个人体的合成图像。我们在本报告中描述了我们如何构建一个面部和身体的参数模型,包括直截手;我们提供管道以这种体型模型为基础产生现实的人类图像;一种培训DNN来反向覆盖整个身体的密集的一组地标的方法;以及一种将我们的身体模型调整成从多重观点预测的密度标志的方法。