The popularity of online shopping is steadily increasing. At the same time, fake product reviewsare published widely and have the potential to affect consumer purchasing behavior. In response,previous work has developed automated methods for the detection of deceptive product reviews.However, studies vary considerably in terms of classification performance, and many use data thatcontain potential confounds, which makes it difficult to determine their validity. Two possibleconfounds are data-origin (i.e., the dataset is composed of more than one source) and productownership (i.e., reviews written by individuals who own or do not own the reviewed product). Inthe present study, we investigate the effect of both confounds for fake review detection. Using anexperimental design, we manipulate data-origin, product ownership, review polarity, and veracity.Supervised learning analysis suggests that review veracity (60.26 - 69.87%) is somewhat detectablebut reviews additionally confounded with product-ownership (66.19 - 74.17%), or with data-origin(84.44 - 86.94%) are easier to classify. Review veracity is most easily classified if confounded withproduct-ownership and data-origin combined (87.78 - 88.12%), suggesting overestimations of thetrue performance in other work. These findings are moderated by review polarity.
翻译:在线购物的普及程度正在稳步提高。与此同时,假产品审查公布得非常广泛,并有可能影响消费者购买行为。作为回应,以往的工作已经开发了检测欺骗性产品审查的自动化方法。然而,研究在分类性能方面差异很大,许多使用的数据可能令人困惑,因此难以确定其有效性。有两个可能的缺陷是数据来源(即,数据集由不止一个来源组成)和产品所有权(即,拥有或并不拥有被审查产品的个人撰写的审查)。在目前的研究中,我们调查了两种令人困惑的检验方法的效果。我们利用实验性设计,操纵数据来源、产品所有权、审查极地性和真实性。超常的学习分析表明,审查真实性(60.26-69.87%)是某种可探测的,但根据产品所有权(66.19-74.17%)和产品所有权(即,或数据来源(84.44-86.94%)进行的额外审查比较容易分类。审查真实性,如果与产品所有权和数据基础审查相结合,则最容易分类。