Fake reviews and review manipulation are growing problems on online marketplaces globally. Review Hijacking is a new review manipulation tactic in which unethical sellers "hijack" an existing product page (usually one with many positive reviews), then update the product details like title, photo, and description with those of an entirely different product. With the earlier reviews still attached, the new item appears well-reviewed. However, there are no public datasets of review hijacking and little is known in the literature about this tactic. Hence, this paper proposes a three-part study: (i) we propose a framework to generate synthetically labeled data for review hijacking by swapping products and reviews; (ii) then, we evaluate the potential of both a Twin LSTM network and BERT sequence pair classifier to distinguish legitimate reviews from hijacked ones using this data; and (iii) we then deploy the best performing model on a collection of 31K products (with 6.5 M reviews) in the original data, where we find 100s of previously unknown examples of review hijacking.
翻译:在网上市场上,虚假的审查和审查操纵正在日益成为全球范围越来越多的问题。审查劫机是一个新的审查操纵策略,不道德的卖方“劫机”现有产品页(通常有许多积极的审查),然后用完全不同的产品标题、照片和描述更新产品细节。随着早先的审查仍然附着,新的项目似乎经过了很好的审查。然而,没有关于审查劫机的公开数据集,文献对这一策略很少了解。因此,本文件提出一个三部分研究:(一) 我们提出一个框架,以生成合成标签的数据,用以通过交换产品和审查审查来审查劫机情况;(二) 然后,我们评估双子LSTM网络和BERT序列对等分类师的潜力,以区分合法审查与使用这一数据被劫持的审查;以及(三) 我们随后在原始数据中采用收集31K产品的最佳模式(6.5M审查)。 我们发现100个以前未知的审查劫机的例子。