跨平台电子商务产品分类与再分类：一种多模态分层分类方法 (Cross-Platform E-Commerce Product Categorization and Recategorization: A Multimodal Hierarchical Classification Approach)

This study addresses critical industrial challenges in e-commerce product categorization, namely platform heterogeneity and the structural limitations of existing taxonomies, by developing and deploying a multimodal hierarchical classification framework. Using a dataset of 271,700 products from 40 international fashion e-commerce platforms, we integrate textual features (RoBERTa), visual features (ViT), and joint vision-language representations (CLIP). We investigate fusion strategies, including early, late, and attention-based fusion within a hierarchical architecture enhanced by dynamic masking to ensure taxonomic consistency. Results show that CLIP embeddings combined via an MLP-based late-fusion strategy achieve the highest hierarchical F1 (98.59%), outperforming unimodal baselines. To address shallow or inconsistent categories, we further introduce a self-supervised "product recategorization" pipeline using SimCLR, UMAP, and cascade clustering, which discovered new, fine-grained categories (for example, subtypes of "Shoes") with cluster purities above 86%. Cross-platform experiments reveal a deployment-relevant trade-off: complex late-fusion methods maximize accuracy with diverse training data, while simpler early-fusion methods generalize more effectively to unseen platforms. Finally, we demonstrate the framework's industrial scalability through deployment in EURWEB's commercial transaction intelligence platform via a two-stage inference pipeline, combining a lightweight RoBERTa stage with a GPU-accelerated multimodal stage to balance cost and accuracy.

翻译：本研究通过开发并部署一个多模态分层分类框架，解决了电子商务产品分类中的关键工业挑战，即平台异构性和现有分类体系的结构性局限。利用来自40个国际时尚电商平台的271,700个产品数据集，我们整合了文本特征（RoBERTa）、视觉特征（ViT）以及联合视觉-语言表征（CLIP）。我们研究了融合策略，包括在通过动态掩码增强的分层架构中的早期融合、晚期融合和基于注意力的融合，以确保分类学一致性。结果表明，通过基于MLP的晚期融合策略结合的CLIP嵌入实现了最高的分层F1分数（98.59%），优于单模态基线。为解决类别浅层或不一致的问题，我们进一步引入了使用SimCLR、UMAP和级联聚类的自监督“产品再分类”流程，该流程发现了新的细粒度类别（例如，“鞋类”的子类型），其聚类纯度高于86%。跨平台实验揭示了与部署相关的权衡：复杂的晚期融合方法在多样化训练数据下最大化准确性，而更简单的早期融合方法对未见平台具有更好的泛化能力。最后，我们通过一个两阶段推理流程，将轻量级RoBERTa阶段与GPU加速的多模态阶段相结合以平衡成本与准确性，将该框架部署在EURWEB的商业交易智能平台中，展示了其工业可扩展性。