Emotion lexica are commonly used resources to combat data poverty in automatic emotion detection. However, vocabulary coverage issues, differences in construction method and discrepancies in emotion framework and representation result in a heterogeneous landscape of emotion detection resources, calling for a unified approach to utilising them. To combat this, we present an extended emotion lexicon of 30,273 unique entries, which is a result of merging eight existing emotion lexica by means of a multi-view variational autoencoder (VAE). We showed that a VAE is a valid approach for combining lexica with different label spaces into a joint emotion label space with a chosen number of dimensions, and that these dimensions are still interpretable. We tested the utility of the unified VAE lexicon by employing the lexicon values as features in an emotion detection model. We found that the VAE lexicon outperformed individual lexica, but contrary to our expectations, it did not outperform a naive concatenation of lexica, although it did contribute to the naive concatenation when added as an extra lexicon. Furthermore, using lexicon information as additional features on top of state-of-the-art language models usually resulted in a better performance than when no lexicon information was used.
翻译:然而,词汇覆盖问题、构建方法的差异以及情感框架和表达方式的差异导致情感检测资源出现差异,要求采用统一的方法来使用它们。为了解决这一问题,我们提供了30,273个独有条目的扩大情感词汇,这是通过多视变异自动编码(VAE)将八种现有情感词汇结合在一起的结果。我们表明,VAE是一种有效的方法,可以将Lexica和不同标签空间合并成一个具有若干选择维度的联合情感标签空间,这些维度仍然可以解释。我们通过将统一VAE词汇作为情感检测模型的特征,测试了统一VAE词汇的效用。我们发现,VAE词汇超越了个人词汇,但与我们的期望相反,它并没有超越了一种天真的词汇,尽管在添加为额外词典时,它确实有助于天真的拼写。此外,我们使用Lexicicon信息作为州级语言顶部的附加特征,通常没有在使用更好的语言表现模型中产生更好的效果。