Extracting structured information from unstructured data is one of the key challenges in modern information retrieval applications, including e-commerce. Here, we demonstrate how recent advances in machine learning, combined with a recently published multilingual data set with standardized fine-grained product category information, enable robust product attribute extraction in challenging transfer learning settings. Our models can reliably predict product attributes across online shops, languages, or both. Furthermore, we show that our models can be used to match product taxonomies between online retailers.
翻译:从非结构化数据中提取结构化信息是现代信息检索应用程序(包括电子商务)的关键挑战之一。在这里,我们展示了机器学习的最新进展,加上最近出版的多语言数据集,以及标准化的精细产品类别信息,在挑战性转让学习环境中能够实现强有力的产品属性提取。我们的模型可以可靠地预测在线商店、语言或两者的产品属性。此外,我们展示了我们的模型可以用来匹配在线零售商之间的产品分类。