AI-assisted graphic design has emerged as a powerful tool for automating the creation and editing of design elements such as posters, banners, and advertisements. While diffusion-based text-to-image models have demonstrated strong capabilities in visual content generation, their text rendering performance, particularly for small-scale typography and non-Latin scripts, remains limited. In this paper, we propose UTDesign, a unified framework for high-precision stylized text editing and conditional text generation in design images, supporting both English and Chinese scripts. Our framework introduces a novel DiT-based text style transfer model trained from scratch on a synthetic dataset, capable of generating transparent RGBA text foregrounds that preserve the style of reference glyphs. We further extend this model into a conditional text generation framework by training a multi-modal condition encoder on a curated dataset with detailed text annotations, enabling accurate, style-consistent text synthesis conditioned on background images, prompts, and layout specifications. Finally, we integrate our approach into a fully automated text-to-design (T2D) pipeline by incorporating pre-trained text-to-image (T2I) models and an MLLM-based layout planner. Extensive experiments demonstrate that UTDesign achieves state-of-the-art performance among open-source methods in terms of stylistic consistency and text accuracy, and also exhibits unique advantages compared to proprietary commercial approaches. Code and data for this paper are available at https://github.com/ZYM-PKU/UTDesign.
翻译:人工智能辅助图形设计已成为自动化创建和编辑海报、横幅、广告等设计元素的有力工具。尽管基于扩散的文本到图像模型在视觉内容生成方面展现出强大能力,但其文本渲染性能——尤其是针对小尺度排版与非拉丁文字——仍存在局限。本文提出UTDesign,一个用于设计图像中高精度风格化文本编辑与条件文本生成的统一框架,同时支持英文与中文文字。该框架引入了一种基于DiT的全新文本风格迁移模型,该模型在合成数据集上从头训练,能够生成保留参考字形风格的透明RGBA文本前景。我们进一步将该模型扩展为条件文本生成框架,通过在带有详细文本标注的精选数据集上训练多模态条件编码器,实现了基于背景图像、提示词与布局规格的精确且风格一致的文本合成。最后,我们将该方法与预训练的文本到图像模型及基于MLLM的布局规划器相结合,集成至全自动文本到设计流程中。大量实验表明,UTDesign在风格一致性与文本准确性方面达到了开源方法中的最优性能,与专有商业方法相比亦展现出独特优势。本文代码与数据公开于https://github.com/ZYM-PKU/UTDesign。