Scalable Vector Graphics (SVG) is an important image format widely adopted in graphic design because of their resolution independence and editability. The study of generating high-quality SVG has continuously drawn attention from both designers and researchers in the AIGC community. However, existing methods either produces unstructured outputs with huge computational cost or is limited to generating monochrome icons of over-simplified structures. To produce high-quality and complex SVG, we propose OmniSVG, a unified framework that leverages pre-trained Vision-Language Models (VLMs) for end-to-end multimodal SVG generation. By parameterizing SVG commands and coordinates into discrete tokens, OmniSVG decouples structural logic from low-level geometry for efficient training while maintaining the expressiveness of complex SVG structure. To further advance the development of SVG synthesis, we introduce MMSVG-2M, a multimodal dataset with two million richly annotated SVG assets, along with a standardized evaluation protocol for conditional SVG generation tasks. Extensive experiments show that OmniSVG outperforms existing methods and demonstrates its potential for integration into professional SVG design workflows.
翻译:可缩放矢量图形(SVG)因其分辨率无关性和可编辑性,在图形设计中被广泛采用,是一种重要的图像格式。生成高质量SVG的研究持续吸引着AIGC领域的设计者和研究者的关注。然而,现有方法要么产生计算成本高昂的非结构化输出,要么仅限于生成结构过度简化的单色图标。为生成高质量且复杂的SVG,我们提出了OmniSVG,一个利用预训练视觉-语言模型(VLMs)进行端到端多模态SVG生成的统一框架。通过将SVG命令和坐标参数化为离散标记,OmniSVG将结构逻辑与底层几何解耦,以实现高效训练,同时保持复杂SVG结构的表达能力。为进一步推动SVG合成的发展,我们引入了MMSVG-2M,这是一个包含两百万个丰富标注SVG资产的多模态数据集,以及用于条件SVG生成任务的标准化评估协议。大量实验表明,OmniSVG优于现有方法,并展示了其融入专业SVG设计工作流程的潜力。