This study addresses critical industrial challenges in e-commerce product categorization, namely platform heterogeneity and the structural limitations of existing taxonomies, by developing and deploying a multimodal hierarchical classification framework. Using a dataset of 271,700 products from 40 international fashion e-commerce platforms, we integrate textual features (RoBERTa), visual features (ViT), and joint vision-language representations (CLIP). We investigate fusion strategies, including early, late, and attention-based fusion within a hierarchical architecture enhanced by dynamic masking to ensure taxonomic consistency. Results show that CLIP embeddings combined via an MLP-based late-fusion strategy achieve the highest hierarchical F1 (98.59%), outperforming unimodal baselines. To address shallow or inconsistent categories, we further introduce a self-supervised "product recategorization" pipeline using SimCLR, UMAP, and cascade clustering, which discovered new, fine-grained categories (for example, subtypes of "Shoes") with cluster purities above 86%. Cross-platform experiments reveal a deployment-relevant trade-off: complex late-fusion methods maximize accuracy with diverse training data, while simpler early-fusion methods generalize more effectively to unseen platforms. Finally, we demonstrate the framework's industrial scalability through deployment in EURWEB's commercial transaction intelligence platform via a two-stage inference pipeline, combining a lightweight RoBERTa stage with a GPU-accelerated multimodal stage to balance cost and accuracy.
翻译:本研究通过开发并部署一个多模态分层分类框架,解决了电子商务产品分类中的关键工业挑战,即平台异构性和现有分类体系的结构性局限。利用来自40个国际时尚电商平台的271,700个产品数据集,我们整合了文本特征(RoBERTa)、视觉特征(ViT)以及联合视觉-语言表征(CLIP)。我们研究了融合策略,包括在通过动态掩码增强的分层架构中的早期融合、晚期融合和基于注意力的融合,以确保分类学一致性。结果表明,通过基于MLP的晚期融合策略结合的CLIP嵌入实现了最高的分层F1分数(98.59%),优于单模态基线。为解决类别浅层或不一致的问题,我们进一步引入了使用SimCLR、UMAP和级联聚类的自监督“产品再分类”流程,该流程发现了新的细粒度类别(例如,“鞋类”的子类型),其聚类纯度高于86%。跨平台实验揭示了与部署相关的权衡:复杂的晚期融合方法在多样化训练数据下最大化准确性,而更简单的早期融合方法对未见平台具有更好的泛化能力。最后,我们通过一个两阶段推理流程,将轻量级RoBERTa阶段与GPU加速的多模态阶段相结合以平衡成本与准确性,将该框架部署在EURWEB的商业交易智能平台中,展示了其工业可扩展性。