LayerComposer：通过空间感知分层画布实现交互式个性化文本到图像生成 (LayerComposer: Interactive Personalized T2I via Spatially-Aware Layered Canvas)

Guocheng Gordon Qian,Ruihang Zhang,Tsai-Shien Chen,Yusuf Dalva,Anujraaj Argo Goyal,Willi Menapace,Ivan Skorokhodov,Meng Dong,Arpit Sahni,Daniil Ostashev,Ju Hu,Sergey Tulyakov,Kuan-Chieh Jackson Wang

from arxiv, 9 pages, preprint. Project page: https://snap-research.github.io/layercomposer/

Despite their impressive visual fidelity, existing personalized generative models lack interactive control over spatial composition and scale poorly to multiple subjects. To address these limitations, we present LayerComposer, an interactive framework for personalized, multi-subject text-to-image generation. Our approach introduces two main contributions: (1) a layered canvas, a novel representation in which each subject is placed on a distinct layer, enabling occlusion-free composition; and (2) a locking mechanism that preserves selected layers with high fidelity while allowing the remaining layers to adapt flexibly to the surrounding context. Similar to professional image-editing software, the proposed layered canvas allows users to place, resize, or lock input subjects through intuitive layer manipulation. Our versatile locking mechanism requires no architectural changes, relying instead on inherent positional embeddings combined with a new complementary data sampling strategy. Extensive experiments demonstrate that LayerComposer achieves superior spatial control and identity preservation compared to the state-of-the-art methods in multi-subject personalized image generation.

翻译：尽管现有个性化生成模型在视觉保真度方面表现优异，但其缺乏对空间构图的交互式控制，并且在处理多主体场景时扩展性不足。为应对这些局限，本文提出LayerComposer——一个面向个性化多主体文本到图像生成的交互式框架。我们的方法包含两大核心贡献：（1）分层画布：一种新颖的表征形式，每个主体被置于独立图层，实现无遮挡构图；（2）锁定机制：在保持选定图层高保真度的同时，允许其余图层灵活适应周边语境。与专业图像编辑软件类似，所提出的分层画布使用户能通过直观的图层操作来放置、缩放或锁定输入主体。我们的通用锁定机制无需修改模型架构，而是依赖固有的位置编码与创新的互补数据采样策略。大量实验表明，在多主体个性化图像生成任务中，LayerComposer相比现有最优方法实现了更优越的空间控制与身份保持能力。