提高多模电子商务属性值提取的统一学习方案与动态范围最小化 (Boosting Multi-Modal E-commerce Attribute Value Extraction via Unified Learning Scheme and Dynamic Range Minimization)

With the prosperity of e-commerce industry, various modalities, e.g., vision and language, are utilized to describe product items. It is an enormous challenge to understand such diversified data, especially via extracting the attribute-value pairs in text sequences with the aid of helpful image regions. Although a series of previous works have been dedicated to this task, there remain seldomly investigated obstacles that hinder further improvements: 1) Parameters from up-stream single-modal pretraining are inadequately applied, without proper jointly fine-tuning in a down-stream multi-modal task. 2) To select descriptive parts of images, a simple late fusion is widely applied, regardless of priori knowledge that language-related information should be encoded into a common linguistic embedding space by stronger encoders. 3) Due to diversity across products, their attribute sets tend to vary greatly, but current approaches predict with an unnecessary maximal range and lead to more potential false positives. To address these issues, we propose in this paper a novel approach to boost multi-modal e-commerce attribute value extraction via unified learning scheme and dynamic range minimization: 1) Firstly, a unified scheme is designed to jointly train a multi-modal task with pretrained single-modal parameters. 2) Secondly, a text-guided information range minimization method is proposed to adaptively encode descriptive parts of each modality into an identical space with a powerful pretrained linguistic model. 3) Moreover, a prototype-guided attribute range minimization method is proposed to first determine the proper attribute set of the current product, and then select prototypes to guide the prediction of the chosen attributes. Experiments on the popular multi-modal e-commerce benchmarks show that our approach achieves superior performance over the other state-of-the-art techniques.

翻译：随着电子商务行业的繁荣发展，各种模态，如视觉和语言，被用于描述产品项。理解这样多样化的数据是一个巨大的挑战，尤其是通过利用有用的图像区域来提取文本序列中的属性-值对。尽管以前一系列的工作已经致力于这项任务，但仍然存在阻碍进一步改进的很少探索过的障碍：1）上游单模预训练的参数没有充分应用，缺乏适当的联合微调，在下游多模任务中。2）为了选择图像的描述性部分，广泛应用简单的后期融合，而不考虑先验知识，即相关语言信息应通过更强的编码器编码为共同的语言嵌入空间。3）由于产品之间的多样性，它们的属性集往往变化很大，但目前的方法通过不必要的最大范围进行预测，并导致更多的误报风险。为了解决这些问题，我们在本文中提出了一种新方法，通过统一的学习方案和动态范围最小化来提高多模电子商务属性值提取：1）首先，设计了一个统一的方案，以预训练的单模参数联合训练多模任务。2）其次，提出了一种由文本指导的信息范围最小化方法，将每个模态的描述性部分自适应地编码为与强大的预训练语言模型的相同空间。3）此外，提出了一种基于原型的属性范围最小化方法，首先确定当前产品的适当属性集，然后选择原型来指导所选属性的预测。在流行的多模电子商务基准测试上的实验证明，我们的方法比其他最先进的技术实现了更优异的性能。

相关内容

电子商务

关注 2

电子商务（ Electronic Commerce）的定义： 电子商务是利用计算机技术、网络技术和远程通信技术，实现电子化、数字化和网络化的整个商务过程。　　联合国国际贸易程序简化工作组对电子商务的定义是：采用电子形式开展商务活动，它包括在供应商、客户、政府及其他参与方之间通过任何电子工具，如 EDI、 Web技术、电子邮件等共享非结构化商务信息，并管理和完成在商务活动、管理活动和消费活动中的各种交易。

【CVPR2022】基于节点-邻域互信息最大化的图中节点表示学习

专知会员服务

23+阅读 · 2022年3月28日

【CVPR 2022】跨模态检索的协同双流视觉-语言前训练模型，COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

专知会员服务

13+阅读 · 2022年3月12日

【CVPR2022】三元组对比学习的视觉-语言预训练

专知会员服务

33+阅读 · 2022年3月3日

【ICCV2021】多层次对比学习的跨模态检索方法

专知会员服务

23+阅读 · 2021年10月24日