Despite their impressive visual fidelity, existing personalized generative models lack interactive control over spatial composition and scale poorly to multiple subjects. To address these limitations, we present LayerComposer, an interactive framework for personalized, multi-subject text-to-image generation. Our approach introduces two main contributions: (1) a layered canvas, a novel representation in which each subject is placed on a distinct layer, enabling occlusion-free composition; and (2) a locking mechanism that preserves selected layers with high fidelity while allowing the remaining layers to adapt flexibly to the surrounding context. Similar to professional image-editing software, the proposed layered canvas allows users to place, resize, or lock input subjects through intuitive layer manipulation. Our versatile locking mechanism requires no architectural changes, relying instead on inherent positional embeddings combined with a new complementary data sampling strategy. Extensive experiments demonstrate that LayerComposer achieves superior spatial control and identity preservation compared to the state-of-the-art methods in multi-subject personalized image generation.
翻译:尽管现有个性化生成模型具有令人印象深刻的视觉保真度,但其缺乏对空间构图的交互式控制,并且在处理多主体时扩展性不佳。为应对这些局限性,我们提出了LayerComposer,一个用于个性化多主体文本到图像生成的交互式框架。我们的方法包含两个主要贡献:(1) 分层画布——一种新颖的表征形式,其中每个主体被置于独立的图层,实现无遮挡构图;(2) 锁定机制——在保持选定图层高保真度的同时,允许其余图层灵活适应周围环境。与专业图像编辑软件类似,所提出的分层画布使用户能够通过直观的图层操作来放置、调整大小或锁定输入主体。我们的通用锁定机制无需改变模型架构,而是依赖固有的位置嵌入与新颖的互补数据采样策略。大量实验表明,在多主体个性化图像生成任务中,LayerComposer相比现有最先进方法实现了更优的空间控制与身份保持能力。