This work presents Controllable Layer Decomposition (CLD), a method for achieving fine-grained and controllable multi-layer separation of raster images. In practical workflows, designers typically generate and edit each RGBA layer independently before compositing them into a final raster image. However, this process is irreversible: once composited, layer-level editing is no longer possible. Existing methods commonly rely on image matting and inpainting, but remain limited in controllability and segmentation precision. To address these challenges, we propose two key modules: LayerDecompose-DiT (LD-DiT), which decouples image elements into distinct layers and enables fine-grained control; and Multi-Layer Conditional Adapter (MLCA), which injects target image information into multi-layer tokens to achieve precise conditional generation. To enable a comprehensive evaluation, we build a new benchmark and introduce tailored evaluation metrics. Experimental results show that CLD consistently outperforms existing methods in both decomposition quality and controllability. Furthermore, the separated layers produced by CLD can be directly manipulated in commonly used design tools such as PowerPoint, highlighting its practical value and applicability in real-world creative workflows.
翻译:本文提出可控层分解(CLD),一种实现栅格图像细粒度、可控多层分离的方法。在实际工作流程中,设计师通常独立生成和编辑每个RGBA图层,再将其合成为最终的栅格图像。然而,此过程不可逆:一旦合成,便无法再进行图层级编辑。现有方法通常依赖于图像抠图和修复,但在可控性和分割精度方面仍存在局限。为解决这些挑战,我们提出了两个关键模块:LayerDecompose-DiT(LD-DiT),用于将图像元素解耦至不同图层并实现细粒度控制;以及多层条件适配器(MLCA),通过将目标图像信息注入多层标记来实现精确的条件生成。为进行全面评估,我们构建了一个新基准并引入了定制化的评估指标。实验结果表明,CLD在分解质量和可控性方面均持续优于现有方法。此外,CLD生成的分离图层可直接在PowerPoint等常用设计工具中进行操作,凸显了其在现实创意工作流程中的实用价值和适用性。