In this paper, we propose a method to identify identical commodities. In e-commerce scenarios, commodities are usually described by both images and text. By definition, identical commodities are those that have identical key attributes and are cognitively identical to consumers. There are two main challenges: 1) The extraction and fusion of multi-modal representation. 2) The ability to verify whether two commodities are identical by comparing the distance between representations with a threshold. To address the above problems, we propose an end-to-end identical commodity verification method based on self-adaptive thresholds. We use a dual-stream network to extract commodity embeddings and threshold embeddings separately and then concatenate them to obtain commodity representation. Our method is able to obtain different thresholds according to different commodities while maintaining the indexability of the entire commodity representation. We experimentally validate the effectiveness of our multimodal feature fusion and the advantages of self-adaptive thresholds. Besides, our method achieves an F1 score of 0.8936 and takes the 3rd place on the leaderboard for the second task of the CCKS-2022 Knowledge Graph Evaluation for Digital Commerce Competition. Code and pretrained models are available at https://github.com/hanchenchen/CCKS2022-track2-solution.
翻译:在本文中,我们提出一种方法来确定相同的商品。在电子商务情景中,商品通常通过图像和文字来描述。根据定义,相同的商品是具有相同关键特征的商品,在认知上与消费者相同。主要挑战有两大:(1) 多种模式代表的提取和融合。(2) 通过将表述与阈值之间的距离进行比较,核实两种商品是否完全相同的能力。为了解决上述问题,我们提议了一种基于自我适应阈值的终端到终端相同的商品核查方法。我们使用双流网络分别提取商品嵌入和阈值嵌入,然后将它们混为一体,以获得商品代表。我们的方法能够根据不同的商品获得不同的阈值,同时保持整个商品代表值的可索引性。我们实验性地验证了我们多式联运特征融合的有效性和自适应阈值的优势。此外,我们的方法达到了0.8936的F1分,并在CKS-2022数字商业竞争知识图形评价的第二个任务首列板上占据第三位位置。代码和预培训模型可在 https://gimbchen/chenrove。