The promising performance of Deep Neural Networks (DNNs) in text classification, has attracted researchers to use them for fraud review detection. However, the lack of trusted labeled data has limited the performance of the current solutions in detecting fraud reviews. The Generative Adversarial Network (GAN) as a semi-supervised method has demonstrated to be effective for data augmentation purposes. The state-of-the-art solutions utilize GANs to overcome the data scarcity problem. However, they fail to incorporate the behavioral clues in fraud generation. Additionally, state-of-the-art approaches overlook the possible bot-generated reviews in the dataset. Finally, they also suffer from a common limitation in scalability and stability of the GAN, slowing down the training procedure. In this work, we propose ScoreGAN for fraud review detection that makes use of both review text and review rating scores in the generation and detection process. Scores are incorporated through Information Gain Maximization (IGM) into the loss function for three reasons. One is to generate score-correlated reviews based on the scores given to the generator. Second, the generated reviews are employed to train the discriminator, so the discriminator can correctly label the possible bot-generated reviews through joint representations learned from the concatenation of GLobal Vector for Word representation (GLoVe) extracted from the text and the score. Finally, it can be used to improve the stability and scalability of the GAN. Results show that the proposed framework outperformed the existing state-of-the-art framework, namely FakeGAN, in terms of AP by 7\%, and 5\% on the Yelp and TripAdvisor datasets, respectively.
翻译:深神经网络(DNNS)在文本分类方面的有希望的绩效吸引了研究人员使用它们进行欺诈审查的发现;然而,由于缺乏可信的标签数据,限制了当前发现欺诈审查的解决方案的绩效; 创形反对流网络(GAN)作为一种半监督的方法,证明对于数据扩充目的有效; 最先进的解决方案利用GAN来克服数据短缺问题; 但是,它们未能将行为线索纳入欺诈生成过程; 此外, 最新方法忽略了数据集中可能生成的机器人审查。 最后, 由于缺乏可靠的标签数据, 也限制了当前发现欺诈审查的绩效和稳定性的绩效。 在这项工作中, 我们建议ScostGAN进行欺诈审查检测, 既利用审查文本,又审查生成和检测过程中的评级分数。 分数可以通过信息“最大化”(IGM)纳入损失功能, 原因有三。 其中一个是根据给发电机的分数生成与数据生成的三角相关评分框架。 最后, 生成的GANA值审查, 减缓了GLA 。 在G 上正确使用G 的标签上, 和LOA 的评分中,,, 正确地, 将 GLA 进行 。