Sentiment analysis is a key component in various text mining applications. Numerous sentiment classification techniques, including conventional and deep learning-based methods, have been proposed in the literature. In most existing methods, a high-quality training set is assumed to be given. Nevertheless, constructing a high-quality training set that consists of highly accurate labels is challenging in real applications. This difficulty stems from the fact that text samples usually contain complex sentiment representations, and their annotation is subjective. We address this challenge in this study by leveraging a new labeling strategy and utilizing a two-level long short-term memory network to construct a sentiment classifier. Lexical cues are useful for sentiment analysis, and they have been utilized in conventional studies. For example, polar and privative words play important roles in sentiment analysis. A new encoding strategy, that is, $\rho$-hot encoding, is proposed to alleviate the drawbacks of one-hot encoding and thus effectively incorporate useful lexical cues. We compile three Chinese data sets on the basis of our label strategy and proposed methodology. Experiments on the three data sets demonstrate that the proposed method outperforms state-of-the-art algorithms.
翻译:感官分析是各种文字采矿应用中的一个关键组成部分。文献中提出了许多感知分类技术,包括传统和深层次的学习方法。在大多数现有方法中,假定会提供高质量的培训。然而,在实际应用中,建立由高度准确的标签组成的高质量培训组具有挑战性。这一困难源于文本样本通常包含复杂的情绪表现,其批注是主观的。我们通过利用新的标签战略,并利用一个两级的长期长期记忆网络来构建感知分类器来应对本研究中的这一挑战。词汇对感知分析有用,并在传统研究中加以利用。例如,极词和原始词在情感分析中起着重要作用。新的编码战略,即$$rho$-hot编码,旨在减轻一热编码的缺陷,从而有效地纳入有用的词汇提示。我们根据我们的标签战略和拟议方法汇编了三套中国数据。在三个数据集上进行的实验表明,拟议的方法超越了艺术的状态。