A key challenge in attribute value extraction (AVE) from e-commerce sites is how to handle a large number of attributes for diverse products. Although this challenge is partially addressed by a question answering (QA) approach which finds a value in product data for a given query (attribute), it does not work effectively for rare and ambiguous queries. We thus propose simple knowledge-driven query expansion based on possible answers (values) of a query (attribute) for QA-based AVE. We retrieve values of a query (attribute) from the training data to expand the query. We train a model with two tricks, knowledge dropout and knowledge token mixing, which mimic the imperfection of the value knowledge in testing. Experimental results on our cleaned version of AliExpress dataset show that our method improves the performance of AVE (+6.08 macro F1), especially for rare and ambiguous attributes (+7.82 and +6.86 macro F1, respectively).
翻译:从电子商务网站提取属性值(AVE)的一个关键挑战是如何处理多种产品的大量属性。尽管这一挑战部分地通过问答方法解决,该方法为特定查询(属性)的产品数据找到价值,但对于稀有和模糊的查询却无法有效发挥作用。因此,我们提议根据基于 QA 的 AVE 查询(属性)的可能答案(价值) 来扩大查询。我们从培训数据中检索查询(属性)的值,以扩大查询。我们训练了一个模型,用两种技巧,即知识辍学和知识符号混合,以模拟测试中价值知识的不完善。我们清洁版的AliExpress数据集的实验结果显示,我们的方法提高了 AVE (+6.08 宏观F1) 的性能,特别是对于稀有和模糊的属性(分别为+7.82和+6.86 宏观F1) 。