Current text-to-image (T2I) models have demonstrated remarkable progress in creative image generation, yet they still lack precise control over scene illuminants, which is a crucial factor for content designers aiming to manipulate the mood, atmosphere, and visual aesthetics of generated images. In this paper, we present an illuminant personalization method named LumiCtrl that learns an illuminant prompt given a single image of an object. LumiCtrl consists of three basic components: given an image of the object, our method applies (a) physics-based illuminant augmentation along the Planckian locus to create fine-tuning variants under standard illuminants; (b) edge-guided prompt disentanglement using a frozen ControlNet to ensure prompts focus on illumination rather than structure; and (c) a masked reconstruction loss that focuses learning on the foreground object while allowing the background to adapt contextually, enabling what we call contextual light adaptation. We qualitatively and quantitatively compare LumiCtrl against other T2I customization methods. The results show that our method achieves significantly better illuminant fidelity, aesthetic quality, and scene coherence compared to existing personalization baselines. A human preference study further confirms strong user preference for LumiCtrl outputs. The code and data will be released upon publication.
翻译:当前的文本到图像(T2I)模型在创意图像生成方面取得了显著进展,但在场景光照的精确控制上仍存在不足,而这对内容设计师调控生成图像的情绪、氛围与视觉美学至关重要。本文提出了一种名为LumiCtrl的光照个性化方法,该方法能够基于单张物体图像学习光照提示。LumiCtrl包含三个基本组件:给定物体图像后,本方法首先(a)沿普朗克轨迹进行基于物理的光照增强,以生成标准光照下的微调变体;(b)利用冻结的ControlNet进行边缘引导的提示解耦,确保提示聚焦于光照而非结构;(c)采用掩码重建损失,使学习集中于前景物体,同时允许背景根据上下文自适应调整,从而实现我们称之为上下文光照适应的效果。我们将LumiCtrl与其他T2I定制方法进行了定性与定量比较。结果表明,相较于现有个性化基线方法,本方法在光照保真度、美学质量与场景一致性方面均取得显著提升。一项用户偏好研究进一步证实了用户对LumiCtrl生成结果的强烈偏好。代码与数据将在论文发表时公开。