The use of user/product information in sentiment analysis is important, especially for cold-start users/products, whose number of reviews are very limited. However, current models do not deal with the cold-start problem which is typical in review websites. In this paper, we present Hybrid Contextualized Sentiment Classifier (HCSC), which contains two modules: (1) a fast word encoder that returns word vectors embedded with short and long range dependency features; and (2) Cold-Start Aware Attention (CSAA), an attention mechanism that considers the existence of cold-start problem when attentively pooling the encoded word vectors. HCSC introduces shared vectors that are constructed from similar users/products, and are used when the original distinct vectors do not have sufficient information (i.e. cold-start). This is decided by a frequency-guided selective gate vector. Our experiments show that in terms of RMSE, HCSC performs significantly better when compared with on famous datasets, despite having less complexity, and thus can be trained much faster. More importantly, our model performs significantly better than previous models when the training data is sparse and has cold-start problems.
翻译:在情绪分析中使用用户/产品信息非常重要,特别是冷启动用户/产品,因为其审查数量非常有限,但目前的模型并不处理审查网站典型的冷启动问题。在本文件中,我们介绍了混合背景感应分类(HSC),它包含两个模块:(1) 快速字编码,它返回嵌入短距离和长距离依赖特征的文字矢量;(2) 冷启动注意(CSAA),它是一个关注机制,它考虑到在认真汇集编码的向量时存在冷启动问题。 HCSC引进了从类似用户/产品建造的共享向量,当原始不同向量没有足够信息(即冷启动)时使用。这是由频率制导的选择性向量决定的。我们的实验显示,就RME而言,HCSC在与著名的数据集相比,尽管不那么复杂,但表现要好得多,因此可以更快地加以培训。更重要的是,在培训数据稀少和出现冷启动问题时,我们的模型比以前的模型表现得要好得多。