E-commerce is the fastest-growing segment of the economy. Online reviews play a crucial role in helping consumers evaluate and compare products and services. As a result, fake reviews (opinion spam) are becoming more prevalent and negatively impacting customers and service providers. There are many reasons why it is hard to identify opinion spammers automatically, including the absence of reliable labeled data. This limitation precludes an off-the-shelf application of a machine learning pipeline. We propose a new method for classifying reviewers as spammers or benign, combining machine learning with a message-passing algorithm that capitalizes on the users' graph structure to compensate for the possible scarcity of labeled data. We devise a new way of sampling the labels for the training step (active learning), replacing the typical uniform sampling. Experiments on three large real-world datasets from Yelp.com show that our method outperforms state-of-the-art active learning approaches and also machine learning methods that use a much larger set of labeled data for training.
翻译:电子商务是经济中增长最快的部分。在线审查在帮助消费者评估和比较产品和服务方面发挥着关键作用。因此,假审查(软件垃圾邮件)越来越普遍,对客户和服务提供者产生了负面影响。许多原因使得很难自动识别意见垃圾,包括缺少可靠的标签数据。这一限制排除了机器学习管道的现成应用。我们提出了一种新的方法,将审评员分类为垃圾邮件或良性,将机器学习与信息传递算法相结合,利用用户图表结构来弥补标签数据可能稀缺的情况。我们设计了一种新的方法,为培训步骤(积极学习)取样标签,取代典型的统一抽样。对Yelp.com三个大型真实世界数据集的实验表明,我们的方法超越了最新、活跃的学习方法,也采用了使用大得多的标签数据进行训练的机器学习方法。