PAM: 了解不同产品类别中的产品图像 (PAM: Understanding Product Images in Cross Product Category Attribute Extraction)

Understanding product attributes plays an important role in improving online shopping experience for customers and serves as an integral part for constructing a product knowledge graph. Most existing methods focus on attribute extraction from text description or utilize visual information from product images such as shape and color. Compared to the inputs considered in prior works, a product image in fact contains more information, represented by a rich mixture of words and visual clues with a layout carefully designed to impress customers. This work proposes a more inclusive framework that fully utilizes these different modalities for attribute extraction. Inspired by recent works in visual question answering, we use a transformer based sequence to sequence model to fuse representations of product text, Optical Character Recognition (OCR) tokens and visual objects detected in the product image. The framework is further extended with the capability to extract attribute value across multiple product categories with a single model, by training the decoder to predict both product category and attribute value and conditioning its output on product category. The model provides a unified attribute extraction solution desirable at an e-commerce platform that offers numerous product categories with a diverse body of product attributes. We evaluated the model on two product attributes, one with many possible values and one with a small set of possible values, over 14 product categories and found the model could achieve 15% gain on the Recall and 10% gain on the F1 score compared to existing methods using text-only features.

翻译：理解产品属性在改善客户在线购物经验方面起着重要作用,并成为构建产品知识图表的一个组成部分。大多数现有方法侧重于从文本描述中提取属性,或利用产品图象(如形状和颜色)的视觉信息。与以往工作所考虑的投入相比,产品图像实际上包含更多信息,其表现形式是丰富的词汇和视觉线索,并精心设计了布局,以给客户留下深刻印象。这项工作提议了一个更具有包容性的框架,充分利用这些不同的特征提取模式。在视觉问题回答中最近作品的启发下,我们使用基于变压器的序列模式,将产品文本、光学字符识别符号和在产品图象中发现的视觉物体的导出式显示序列。这个框架进一步扩大,通过培训解码器,预测产品类别,将价值归为属性,并在产品类别上调整其产出。这个模式为电子商务平台提供了统一的属性提取解决方案,该平台提供多种产品属性的组合。我们评估了两种产品模型,一个是多种可能的值,一个是产品模型,一个是使用单一的公式,一个是单一产品特性,一个是使用单一的公式,一个是小分数的公式,一个是14个分数,可以取得14个分数的分数。