Recent advances in image generation and editing technologies have enabled state-of-the-art models to achieve impressive results in general domains. However, when applied to e-commerce scenarios, these general models often encounter consistency limitations. To address this challenge, we introduce TBStar-Edit, an new image editing model tailored for the e-commerce domain. Through rigorous data engineering, model architecture design and training strategy, TBStar-Edit achieves precise and high-fidelity image editing while maintaining the integrity of product appearance and layout. Specifically, for data engineering, we establish a comprehensive data construction pipeline, encompassing data collection, construction, filtering, and augmentation, to acquire high-quality, instruction-following, and strongly consistent editing data to support model training. For model architecture design, we design a hierarchical model framework consisting of a base model, pattern shifting modules, and consistency enhancement modules. For model training, we adopt a two-stage training strategy to enhance the consistency preservation: first stage for editing pattern shifting, and second stage for consistency enhancement. Each stage involves training different modules with separate datasets. Finally, we conduct extensive evaluations of TBStar-Edit on a self-proposed e-commerce benchmark, and the results demonstrate that TBStar-Edit outperforms existing general-domain editing models in both objective metrics (VIE Score) and subjective user preference.
翻译:近年来,图像生成与编辑技术的进步使得最先进的模型在通用领域取得了令人瞩目的成果。然而,当应用于电子商务场景时,这些通用模型常常面临一致性的局限。为应对这一挑战,我们推出了TBStar-Edit,一个专为电商领域定制的新型图像编辑模型。通过严谨的数据工程、模型架构设计和训练策略,TBStar-Edit在保持产品外观与布局完整性的同时,实现了精确且高保真的图像编辑。具体而言,在数据工程方面,我们建立了一套完整的数据构建流程,涵盖数据收集、构建、筛选与增强,以获取高质量、遵循指令且具有强一致性的编辑数据来支撑模型训练。在模型架构设计上,我们设计了一个由基础模型、模式迁移模块和一致性增强模块组成的层次化模型框架。在模型训练方面,我们采用两阶段训练策略以增强一致性保持:第一阶段专注于编辑模式迁移,第二阶段则进行一致性增强。每个阶段均使用独立的数据集训练不同的模块。最后,我们在一个自行提出的电商基准测试上对TBStar-Edit进行了广泛评估,结果表明,无论是在客观指标(VIE分数)还是主观用户偏好上,TBStar-Edit均优于现有的通用领域编辑模型。