Recent diffusion models achieve personalization by learning specific subjects, allowing learned attributes to be integrated into generated images. However, personalized human image generation remains challenging due to the need for precise and consistent attribute preservation (e.g., identity, clothing details). Existing subject-driven image generation methods often require either (1) inference-time fine-tuning with few images for each new subject or (2) large-scale dataset training for generalization. Both approaches are computationally expensive and impractical for real-time applications. To address these limitations, we present Wardrobe Polyptych LoRA, a novel part-level controllable model for personalized human image generation. By training only LoRA layers, our method removes the computational burden at inference while ensuring high-fidelity synthesis of unseen subjects. Our key idea is to condition the generation on the subject's wardrobe and leverage spatial references to reduce information loss, thereby improving fidelity and consistency. Additionally, we introduce a selective subject region loss, which encourages the model to disregard some of reference images during training. Our loss ensures that generated images better align with text prompts while maintaining subject integrity. Notably, our Wardrobe Polyptych LoRA requires no additional parameters at the inference stage and performs generation using a single model trained on a few training samples. We construct a new dataset and benchmark tailored for personalized human image generation. Extensive experiments show that our approach significantly outperforms existing techniques in fidelity and consistency, enabling realistic and identity-preserving full-body synthesis.
翻译:近期扩散模型通过学习特定主体实现个性化,使得习得属性能够融入生成图像。然而,由于需要精确且一致的属性保持(如身份特征、服装细节),个性化人体图像生成仍具挑战性。现有主体驱动图像生成方法通常需要:(1)针对每个新主体使用少量图像进行推理时微调,或(2)通过大规模数据集训练实现泛化。这两种方法均计算成本高昂,难以适用于实时应用场景。为突破这些限制,我们提出衣橱多联画LoRA——一种用于个性化人体图像生成的创新部件级可控模型。通过仅训练LoRA层,我们的方法在推理阶段消除了计算负担,同时确保对未见主体的高保真合成。其核心思想是将生成过程以主体衣橱为条件,并利用空间参考减少信息损失,从而提升保真度与一致性。此外,我们引入选择性主体区域损失函数,促使模型在训练过程中忽略部分参考图像。该损失机制确保生成图像在保持主体完整性的同时,与文本提示实现更精准对齐。值得注意的是,我们的衣橱多联画LoRA在推理阶段无需额外参数,仅需使用经少量训练样本训练的单一模型即可完成生成。我们构建了专为个性化人体图像生成定制的新数据集与基准测试。大量实验表明,本方法在保真度与一致性方面显著优于现有技术,能够实现逼真且保持身份特征的全身体合成。