Artistic style transfer in generative models remains a significant challenge, as existing methods often introduce style only via model fine-tuning, additional adapters, or prompt engineering, all of which can be computationally expensive and may still entangle style with subject matter. In this paper, we introduce a training- and inference-light, interpretable method for representing and transferring artistic style. Our approach leverages an art-specific Sparse Autoencoder (SAE) on top of latent embeddings of generative image models. Trained on artistic data, our SAE learns an emergent, largely disentangled set of stylistic and compositional concepts, corresponding to style-related elements pertaining brushwork, texture, and color palette, as well as semantic and structural concepts. We call it LouvreSAE and use it to construct style profiles: compact, decomposable steering vectors that enable style transfer without any model updates or optimization. Unlike prior concept-based style transfer methods, our method requires no fine-tuning, no LoRA training, and no additional inference passes, enabling direct steering of artistic styles from only a few reference images. We validate our method on ArtBench10, achieving or surpassing existing methods on style evaluations (VGG Style Loss and CLIP Score Style) while being 1.7-20x faster and, critically, interpretable.
翻译:生成模型中的艺术风格迁移仍然是一个重大挑战,因为现有方法通常仅通过模型微调、额外适配器或提示工程来引入风格,这些方法可能计算成本高昂,并且风格与主题内容仍可能纠缠不清。本文提出了一种训练和推理轻量、可解释的艺术风格表示与迁移方法。我们的方法在生成式图像模型的潜在嵌入之上,利用一个针对艺术领域设计的稀疏自编码器。通过在艺术数据上进行训练,我们的SAE学习到一组涌现的、在很大程度上解耦的风格与构图概念,这些概念对应于与笔触、纹理、调色板相关的风格元素,以及语义和结构概念。我们将其命名为LouvreSAE,并利用它构建风格剖面:紧凑、可分解的导向向量,无需任何模型更新或优化即可实现风格迁移。与以往基于概念的风格迁移方法不同,我们的方法无需微调、无需LoRA训练、无需额外的推理过程,仅需少量参考图像即可直接引导艺术风格。我们在ArtBench10数据集上验证了我们的方法,在风格评估指标(VGG风格损失和CLIP风格分数)上达到或超越了现有方法,同时速度提升1.7至20倍,并且关键的是,具有可解释性。