Existing person image generative models can do either image generation or pose transfer but not both. We propose a unified diffusion model, UPGPT to provide a universal solution to perform all the person image tasks - generative, pose transfer, and editing. With fine-grained multimodality and disentanglement capabilities, our approach offers fine-grained control over the generation and the editing process of images using a combination of pose, text, and image, all without needing a semantic segmentation mask which can be challenging to obtain or edit. We also pioneer the parameterized body SMPL model in pose-guided person image generation to demonstrate new capability - simultaneous pose and camera view interpolation while maintaining a person's appearance. Results on the benchmark DeepFashion dataset show that UPGPT is the new state-of-the-art while simultaneously pioneering new capabilities of edit and pose transfer in human image generation.
翻译:现有的人物图像生成模型可以实现图像生成或姿态转移,但不可同时实现。我们提出了一种统一的扩散模型UPGPT,提供了一个通用的解决方案,可以完成所有的人物图像任务-生成、姿态转移和编辑。我们的方法具有细粒度的多模态和解耦能力,可以使用姿态、文本和图像的组合对图像的生成和编辑过程进行精细控制,而不需要语义分割掩码,从而避免难以获得或编辑的挑战。我们还首创了基于参数化身体SMPL模型的姿态引导人物图像生成,在保持人物外观的同时实现了姿态和摄像机视图的插值。在基准数据集DeepFashion上的结果表明,UPGPT是新的最先进技术,同时在人物图像生成中开创了编辑和姿态转移的新功能。