意见垃圾邮件探测:使用机器学习和网络计算法的新方法 (Opinion Spam Detection: A New Approach Using Machine Learning and Network-Based Algorithms)

E-commerce is the fastest-growing segment of the economy. Online reviews play a crucial role in helping consumers evaluate and compare products and services. As a result, fake reviews (opinion spam) are becoming more prevalent and negatively impacting customers and service providers. There are many reasons why it is hard to identify opinion spammers automatically, including the absence of reliable labeled data. This limitation precludes an off-the-shelf application of a machine learning pipeline. We propose a new method for classifying reviewers as spammers or benign, combining machine learning with a message-passing algorithm that capitalizes on the users' graph structure to compensate for the possible scarcity of labeled data. We devise a new way of sampling the labels for the training step (active learning), replacing the typical uniform sampling. Experiments on three large real-world datasets from Yelp.com show that our method outperforms state-of-the-art active learning approaches and also machine learning methods that use a much larger set of labeled data for training.

翻译：电子商务是经济中增长最快的部分。在线审查在帮助消费者评估和比较产品和服务方面发挥着关键作用。因此,假审查(软件垃圾邮件)越来越普遍,对客户和服务提供者产生了负面影响。许多原因使得很难自动识别意见垃圾,包括缺少可靠的标签数据。这一限制排除了机器学习管道的现成应用。我们提出了一种新的方法,将审评员分类为垃圾邮件或良性,将机器学习与信息传递算法相结合,利用用户图表结构来弥补标签数据可能稀缺的情况。我们设计了一种新的方法,为培训步骤(积极学习)取样标签,取代典型的统一抽样。对Yelp.com三个大型真实世界数据集的实验表明,我们的方法超越了最新、活跃的学习方法,也采用了使用大得多的标签数据进行训练的机器学习方法。

相关内容

Machine Learning

关注 2241

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日