Sentiment-aware intelligent systems are essential to a wide array of applications including marketing, political campaigns, recommender systems, behavioral economics, social psychology, and national security. These sentiment-aware intelligent systems are driven by language models which broadly fall into two paradigms: 1. Lexicon-based and 2. Contextual. Although recent contextual models are increasingly dominant, we still see demand for lexicon-based models because of their interpretability and ease of use. For example, lexicon-based models allow researchers to readily determine which words and phrases contribute most to a change in measured sentiment. A challenge for any lexicon-based approach is that the lexicon needs to be routinely expanded with new words and expressions. Crowdsourcing annotations for semantic dictionaries may be an expensive and time-consuming task. Here, we propose two models for predicting sentiment scores to augment semantic lexicons at a relatively low cost using word embeddings and transfer learning. Our first model establishes a baseline employing a simple and shallow neural network initialized with pre-trained word embeddings using a non-contextual approach. Our second model improves upon our baseline, featuring a deep Transformer-based network that brings to bear word definitions to estimate their lexical polarity. Our evaluation shows that both models are able to score new words with a similar accuracy to reviewers from Amazon Mechanical Turk, but at a fraction of the cost.
翻译:感官智能系统对于包括营销、政治运动、推荐系统、行为经济学、社会心理学和国家安全在内的广泛应用至关重要。 这些感知智能系统是由语言模式驱动的,这些语言模式大致可分为两个范式:1. 莱克森和2. 背景。虽然最近的背景模型日益占主导地位,但我们仍然看到对基于词汇的模型的需求,因为它们的可解释性和易于使用。例如,基于词汇的模型使研究人员能够随时确定哪些词和词句最有助于衡量情绪的变化。任何基于词汇的处理办法都面临挑战,即词汇法需要以新的字词和表达方式定期扩展。语法字典的众包说明可能是昂贵和耗费时间的任务。在这里,我们提出了两种模型,用以预测情绪分数,以相对较低的成本增加语系嵌入和转移学习。我们的第一个模型利用一个简单和浅色的线性网络,先用经过训练的词嵌入式,先用非直观式的字典和表达方式加以扩展。我们第二个模型和亚伯路德里亚法的分数定义,在基线上改进了我们以历史分数为基准的货币的缩缩图。