Lexicon based sentiment analysis usually relies on the identification of various words to which a numerical value corresponding to sentiment can be assigned. In principle, classifiers can be obtained from these algorithms by comparison with human annotation, which is considered the gold standard. In practise this is difficult in languages such as Portuguese where there is a paucity of human annotated texts. Thus in order to compare algorithms, a next best step is to directly compare different algorithms with each other without referring to human annotation. In this paper we develop methods for a statistical comparison of algorithms which does not rely on human annotation or on known class labels. We will motivate the use of marginal homogeneity tests, as well as log linear models within the framework of maximum likelihood estimation We will also show how some uncertainties present in lexicon based sentiment analysis may be similar to those which occur in human annotated tweets. We will also show how the variability in the output of different algorithms is lexicon dependent, and quantify this variability in the output within the framework of log linear models.
翻译:以词汇为基础的情绪分析通常依赖于对各种词的识别,这些词的数值可以与情绪相对应。原则上,通过与人文注解相比,可以从这些算法中获得分类器,而人类注解则被视为黄金标准。在葡萄牙等语言中,在缺少人文注解文本的情况下,很难做到这一点。因此,为了比较算法,下一个最佳步骤是直接比较不同的算法,而不必提及人类注解。在本文中,我们制定方法,对不依赖人类注解或已知类标签的算法进行统计比较。我们将鼓励使用边际同质测试,以及在最大可能性估计框架内的线性逻辑模型。我们还将表明基于词汇的情绪分析中存在的一些不确定性如何类似于在人类注解推文中出现的不确定性。我们还将表明不同算法输出的变异性如何依赖词汇学,并在log线性模型框架内的输出中量化这种变异性。