Consumers often read product reviews to inform their buying decision, as some consumers want to know a specific component of a product. However, because typical sentences on product reviews contain various details, users must identify sentences about components they want to know amongst the many reviews. Therefore, we aimed to develop a system that identifies and collects component and aspect information of products in sentences. Our BERT-based classifiers assign labels referring to components and aspects to sentences in reviews and extract sentences with comments on specific components and aspects. We determined proper labels based for the words identified through pattern matching from product reviews to create the training data. Because we could not use the words as labels, we carefully created labels covering the meanings of the words. However, the training data was imbalanced on component and aspect pairs. We introduced a data augmentation method using WordNet to reduce the bias. Our evaluation demonstrates that the system can determine labels for road bikes using pattern matching, covering more than 88\% of the indicators of components and aspects on e-commerce sites. Moreover, our data augmentation method can improve the-F1-measure on insufficient data from 0.66 to 0.76.
翻译:消费者经常阅读产品审查,以了解其购买决定,因为一些消费者希望知道产品的具体组成部分。然而,由于产品审查的典型句子包含各种细节,用户必须确定在许多审查中他们想要知道的部件的句子。因此,我们的目标是开发一个系统,确定和收集判决中产品的组成部分和方面信息。我们的BERT分类者在审查中指定提及判决的组成部分和内容的标签,并用对具体组成部分和方面的评论来摘录判决。我们为通过产品审查的模式匹配而确定的词语确定了适当的标签,以创建培训数据。由于我们无法使用这些词语作为标签,我们谨慎地创建了涵盖这些词语含义的标签。然而,培训数据在组件和侧面配对上不平衡。我们采用了数据增强方法,使用WordNet来减少偏差。我们的评估表明,该系统可以使用模式匹配确定公路自行车的标签,涵盖电子商务地点组成部分和方面指标的88 ⁇ 以上。此外,我们的数据增强方法可以改进关于数据不足的F1计量方法,从0.66到0.76到0.76。