The method of paired comparisons is an established method in psychology. In this article, it is applied to obtain continuous sentiment scores for words from comparisons made by test persons. We created an initial lexicon with $n=199$ German words from a two-fold all-pair comparison experiment with ten different test persons. From the probabilistic models taken into account, the logistic model showed the best agreement with the results of the comparison experiment. The initial lexicon can then be used in different ways. One is to create special purpose sentiment lexica through the addition of arbitrary words that are compared with some of the initial words by test persons. A cross-validation experiment suggests that only about 18 two-fold comparisons are necessary to estimate the score of a new, yet unknown word, provided these words are selected by a modification of a method by Silverstein & Farrell. Another application of the initial lexicon is the evaluation of automatically created corpus-based lexica. By such an evaluation, we compared the corpus-based lexica SentiWS, SenticNet, and SentiWordNet, of which SenticNet 4 performed best. This technical report is a corrected and extended version of a presentation made at the ICDM Sentire workshop in 2016.
翻译:配对比较的方法是心理学的一种既定方法。在本条中,它用于从测试人比较的词句中获得连续感应分数。我们从与10个不同的测试人进行的双倍全帕比较实验中,创建了一个最初的词汇表,其价值为199美元。从考虑到概率模型的模型中,后勤模型显示了与比较实验结果的最佳一致。然后,可以以不同的方式使用初始词汇法。其中之一是通过添加与测试人的一些初始词词相比的任意词句来创造特殊目的的感应法。一个交叉校验实验表明,仅需要大约18个二倍的词汇表来估计一个新但未知的词的得分数,条件是这些词是通过Silverstein & Farrell修改方法选定的。最初的词汇模型的另一个应用是评价自动创建的以立体为基础的词汇法。通过这种评价,我们比较了基于基本法的词典、SenttiWSNet和SentiWordNet, 以及SentiWordNet, 其中SentiWNet是SentriNet 4最佳的模型。本技术报告在2016年的Sentri研讨会上作了修正并扩充了。