In this study, we deal with the problem of judging the credibility of movie reviews. The problem is challenging because even experts cannot clearly and efficiently judge the credibility of a movie review and the number of movie reviews is very large. To tackle this problem, we propose historical credibility that judges the credibility of reviews based on the historical ratings and textual reviews written by each reviewer. For this, we present three kinds of criteria that can clearly classify the reviews into trusted or distrusted ones. We validate the effectiveness of the proposed historical credibility through extensive analysis. Specifically, we show that characteristics between the trusted or distrusted reviews are quite distinguishable in terms of three viewpoints: 1) distribution, 2) statistics, and 3) correlation. Then, we apply historical credibility to a weakly supervised model to classify a given review as a trusted or distrusted one. First, we show that it is significantly efficient because the entire data set is annotated according to the predefined criteria. Indeed, it can annotate 6,400 movie reviews only in 0.093 seconds, which occupy only 0.55%~1.88% of the total learning time when we use LSTM and SVM as the learning model. Second, we show that the historical credibility-based classification model clearly outperforms the textual review-based classification model. Specifically, the classification accuracy of the former outperforms that of the latter by up to 11.7%~13.4%. In addition, we clearly confirm that our classification model shows higher accuracy as the data size increases.
翻译:在这项研究中,我们处理的是电影审查的可信度问题。问题之所以具有挑战性,是因为专家甚至无法明确而有效地判断电影审查的可信度,电影审查的数量也非常大。为了解决这一问题,我们建议历史可信度,根据历史评级和每位审查者编写的文本审查来判断审查的可信度。为此,我们提出三种标准,可以将审查明确分为可信或不信任的类别。我们通过广泛的分析来验证拟议的历史可信度的有效性。具体地说,我们表明,可信或不信任的审查的特征在三种观点(1) 分布、(2) 统计和(3) 相关性方面都非常可辨别。然后,我们将历史可信度运用于一个受监管薄弱的模式,将某一审查归类为受信任或不信任的模式。首先,我们表明其效率很高,因为整个数据集是根据预先界定的标准附加说明的。事实上,我们只能用模型0.093秒来说明6,4 电影审查的有效性,这只占我们使用LSTM和SVM作为学习模型时学习总时间的0.55-1.88%。第二,我们通过具体地显示,以历史可信度的形式将数据分类作为后一种模型。我们基于的分类的分类。我们以历史可信度显示,具体地显示,以11种数据的形式显示,它为格式的分类。