Online reviews play an integral part for success or failure of businesses. Prior to purchasing services or goods, customers first review the online comments submitted by previous customers. However, it is possible to superficially boost or hinder some businesses through posting counterfeit and fake reviews. This paper explores a natural language processing approach to identify fake reviews. We present a detailed analysis of linguistic features for distinguishing fake and trustworthy online reviews. We study 15 linguistic features and measure their significance and importance towards the classification schemes employed in this study. Our results indicate that fake reviews tend to include more redundant terms and pauses, and generally contain longer sentences. The application of several machine learning classification algorithms revealed that we were able to discriminate fake from real reviews with high accuracy using these linguistic features.
翻译:在线审查对企业的成败具有不可或缺的作用。在采购服务或货物之前,客户首先审查前客户提交的在线评论。然而,通过张贴假冒和假冒审查,可能表面上刺激或阻碍某些企业。本文探讨了一种自然语言处理方法,以识别假审查。我们详细分析了语言特征,以区分假的和可信赖的在线审查。我们研究了15种语言特征,并测量了这些特征对本研究报告所采用的分类计划的重要性和重要性。我们的结果表明,假审查往往包括更多多余的术语和暂停,通常包含更长的刑期。一些机器学习算法的应用表明,我们能够对使用这些语言特征进行非常精确的实际审查进行歧视。